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THE ANALYSIS OF VARIATION IN A NATURAL POPULA- 
TION OF LADY BEETLES 


Frank M. Srurtrevant, Jr., Lyte D. Catvin anp ORLANDO PARK 


From the Division of Biological Research, G. D. Searle & Co., Box 5110, Chicago 80, 
Institute of Statistics, North Carolina State College, Raleigh; 
and Cresap Biological Laboratory, Northwestern University, Evanston 


INTRODUCTION 


Biometric and taxonomic studies of quantitative characters in 
natural populations have led to the discovery of cases of continuous 
geographic variation, or clines, and of subpopulations displaying 
various degrees of reproductive and morphological intergradation. 
Because of the empirical fact that two or more natural entities may 
bear any degree of relationship along the spectrum of reproductive 
isolation, from complete intersterility to complete interfertility, regard- 
less of their absolute or partial geographic coexistence, only an arbitrary 
point can be established on that spectrum for the taxonomic convenience 
of division of such entities into subgroups. The polytypic species 
concept (Mayr, 1942, Chap. 6) is a function of great subjectivity 
when applied to geographically separated, and especially morphologically 
similar, populations. The danger lies in the necessity of classifying 
microevolutionary units for ease of handling and in the implicit assump- 
tion by many taxonomists that such classifications reflect the true 
biological order of nature. Many authors have vainly sought a species 
definition for lack of realization of this latter point (see discussion by 
Gilmour, 1951). 

One of the most objective methods for investigating the systematic 
relationships between evolutionary units is that of Womble (1951), 
even though some subjectivity may enter the picture in the weighting 
of various characters. Fisher (1936) has dealt with such weightings 
by the use of discriminant functions. In order to apply Womble’s 
differential systematics, a good deal of information first must be ac- 
cumulated on the distribution of points on the geno-, pheno-, or ecocline. — 
The genocline is the most fundamental of the three and much work 
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has been pursued on geographic variation in the frequencies of certain 
genes. 

A prime requisite for genocline analysis is the characterization of 
the genetics and variation in a natural population. Shull (1944) has 
published the results of genetic studies on several populations of the 
lady beetle Hippodamia convergens Guer. This species of coccinellids 
is one of the most common of the lady beetles and is found in abundance 
throughout North America. One of its distinguishing characteristics 
is the presence on each elytron, or wing-cover, of six black spots or 
maculations. Mainly through Shull’s efforts, the heredity of the 
elytral maculations in this species and the genetics of interspecies 
relationships have been well-developed (Shull, 1949). A review of the 
literature on the genetics of the Coccinellidae appeared in an earlier 
paper by Shull (1943). 

The purpose of the present paper is to investigate the variation of 
a genetic character in a natural population of Hippodamia convergens, 
illustrating a method for the separation of overlapping distributions of 
phenotypes. 


THE SPOTLESS PHASE 


The elytral maculations of Hippodamia convergens vary greatly in 
size and may be wholly or partly absent, in which case the condition is 
known as the spotless phase. Since the phenotypes of the spotless and 
spotted phases overlap, Shull (1946) devised a method for their separa- 
tion. After many genetic crosses, he found that spotless beetles could 
be identified best by considering only the posterior three maculations, 
which, he observed, tended to be larger than the anterior three in 
spotted beetles and to show a greater frequency of absence in spotless 
beetles. An arbitrary unit for the measurement of the greatest dia- 
meter of a spot was established such that no maculation was scored as 
greater than four; therefore, no total of the three posterior spots could 
exceed 3 X 4 = 12. 

In Shull’s laboratory population, which had been derived from a 
single pair of Michigan beetles (Shull, 1951), it was found that the 
inheritance of spot diameters was such that a posterior total of 7.74 or 
less would divide, on the average, the spotted and spotless phases 
(Shull, 1946). It was further established that the spotless phase was 
governed by “one almost dominant gene’, plus several modifiers 
(Shull, 1944). 

Since Shull, then, had established that the size of the elytral macu- 
lations in this species was under genetic control, it was first necessary 
to examine the variation of the posterior total relative to the anterior 
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total, and second to establish a method for separating the spotless and 
spotted components of the natural population at hand. It was not 
to be expected, on a priori grounds, that the area of overlap would 
be the same for a highly inbred laboratory stock and a natural popu- 
lation, because of the genetic drift which so often accompanies in- 
breeding. 


——————————— 


MATERIAL AND METHODS 


A hibernating population of 1,021 Hippodamia convergens beetles 
was collected in a clump of Calamovilfa grass in the foredune stage of 
the Ogden Dunes, Porter County, Indiana on October 26, 1946, at 
11:00 AM. The right elytra from 1,011 of these specimens were re- 
moved and affixed to slides, ten elytra being damaged in the preparation. 
The greatest diameter of each maculation was then measured micro- 
scopically in units ranging from 0 to 8. 

In accordance with Shull’s observations (1944) that the three 
posterior maculations appeared to act as a unit, totals of the greatest 
diameters of these spots for each elytron were sade and their frequencies 
tabulated (table 1). The same was then done for the three anterior 

' spots on 124 beetles chosen at random. The regression line of the 
anterior sums (y) on the posterior sums (x) for these 124 elytra was 
calculated by least squares: y = 2.17 + 0.3442 (fig. 1). The standard 
deviation of the slope was + 0.108. The coefficient of correlation was 
0.67, with 99 per cent fiducial limits at 0.78 and 0.52. 


, 
i TABLE 1. : fe 
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It was evident that the posterior spots, as a group, tended to be 
larger than the anterior in the mixed sample (fig. 1). The positive 
y-intercept does not mean that individually the posterior three spots 
were most frequently absent. The frequency of absence of each spot, 
numbered anterior to posterior, was calculated and entered in table 2. 
Similarly, of the total occurrence of spot-absence of 381 cases, the 
individual spots accounted for the percentages listed in the last column 
of table 2. The posterior spots were absent more frequently than the 
anterior. These two findings agree with Shull’s observations cited 
earlier. 

With the exception of spot I, the posterior three maculations 
showed a greater frequency of absence and accounted for greater 
fractions of the total spot-absence in the mixed sample. Since, as a 
group, the posterior spots behaved as predicted by Shull’s observations, 
although spot I also exhibited a high tendency to be absent, the fol- 
lowing calculations were based on the posterior sums. Such procedure 


would enable direct comparison of the present results with those of 
Shull. 


ANTERIOR 


te) 
0 2 4 6 8 10 12 14 16 18 20 22 
POSTERIOR 
BIG. 1, 


Regression of the greatest-diameter sums of the three anterior on the three posterior maculations of 

124 right elytra selected at random from the natural population of 1,011 Hippodamia covergens beetles. 

The number of elytra at any one point varies from one to eleven, and is represented by different sized 
circles, 
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TABLE 2, 
Absences of each elytral maculat ber rior i i } | 
a ytre aculation, numbered anterior to posterior, in 124 Hippodamia convergens 
chosen at random from the natural population 


Cases of Absence 
Spot 
Per cent of Per cent of all 
Number possible spots absent 
I 98 79 26 
II 28 23 0 
Ill 35 28 9 
Anterior 161 43 42 
IV 52 42 14 
V 55 44 14 
VI 113 91 30 
Posterior 220 59 58 
Totals 381 100 


ESTIMATION OF POPULATION PARAMETERS 


Employing the data in table 1, a histogram was constructed from 
the frequencies of the sums of the greatest diameters of the three 
posterior maculations on 1,011 right elytra (fig. 2). Since the units of 
measurement used in this paper are twice those employed by Shull, 
the point on the abscissa which would separate the spotless and spotted 
beetles according to his method would be twice his separation point, 
that is, 2 X 7.75 = 15.50. From inspection of figure 2, it is seen that 
this value is much greater than it should be for the present population. 
There was only a slight probability that the separation points of the 
two populations would coincide, because the population at hand was a 
natural one, while Shull’s was a laboratory stock derived from a -smgle 
pair of beetles. Such a cultured stock, especially when small, is highly 
susceptible to the effects of genetic drift, and as a result, one would 
expect a priori a shift in the separation point and mean of the spotted 
beetles. The standard deviation of the spotted population was probably _ 
smaller in Shull’s stock, since he had a separation point at 15.50, when 
the maximal possible value was 24.00 (in our units). Obviously, 
some method for separating the spotted and spotless phases other than 
Shull’s would have to be utilized in the present case. 
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FIG, 2. 


Histogram of frequencies of greatest-diameter sums of the three posterior maculations of 1,011 right 

elytra from a natural population of Hippodamia convergens. Mean (M) and standard deviation (S) 

calculated by truncating the histogram at xo ; the corresponding normal curve is imposed on the 

diagram. Area under the normal curve represents the 897 spotted beetles; remaining area to the left 
represents the 114 spotless beetles. 


The histogram in figure 2 is obviously bimodal, and the population 
to the right (the spotted beetles) seemed to describe a normal curve. 
It was reasoned therefore that the two populations could be separated 
best by cutting the curve at an arbitrary truncation point (x), beyond 
which it would be highly improbable that any spotless beetles would 
fall, and then calculating the mean and standard deviation of the 
truncated normal curve. From these statistics, the size of the popu- 
lation under the truncated curve could be calculated. The estimation 
of the total numbers of spotted and spotless beetles from these data 
would then be a relatively simple step. The method used for working 
with such a truncated normal curve with unknown population size 
was that of Cohen (1949). His notation has been followed in the 
present paper with the exception that the primes have been omitted. 


aes 
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Fitting the Normal Curve 


The point of truncation was selected by inspection as x) = 9.5. 
This point is an arbitrary choice and might have been taken slightly 
lower. It was taken as the point above which, in the authors’ opinion, 
there was a very low probability that any spotless beetles would occur. 

The necessary summations are >) « = 3288.0 and >> 2” = 17,205.5, 
where x is measured from 2. The expression 


ke 2 = (>> 2)’ 
FD | 


was calculated as 0.30820, where n is the number of beetles (822) in 
the truncated sample. y, is needed only in estimating h and o, and is 
a moment function of h (the point of truncation measured in standard 
units; that is, h = (a — u)/o, where u and o are the mean and standard 
deviation respectively of the population). 

Cohen has plotted the relationship between y, and h, so that an 
initial estimate of h = —1.40 could be obtained from the graph. By 
successive approximations in the expression 


w= (Sle = 2) 


h was calculated as —1.383 and Z as 0.16725. Z is defined by the 
identity 


Zh) = (h)/Io(h), 


where ¢(h) is the ordinate of the normal curve at ¢ = A, and J,(h) is 
the area under the curve to the right of the point t = h. 
The estimates of uw and o are given by 


and 


To obtain estimates of the standard deviations of m and s, ten — 


independent random samples of 1011 elytra each were drawn from the 
total population of 1011 and separate estimates of m and s calculated 
for each sample. The standard deviations of m and s, each with 9 
degrees of freedom, were obtained from these 10 samples. The values 
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of m and s, with their standard deviations, are 
m = 13.068 + 0.179 
gs = 2.580 + 0.081 


Having calculated the mean and standard deviation, the truncation 
point in standard units was given by h = —1.383. The area to the 
right of x is calculated by 


Pe Ff) dt = | "(dt + | " f(b dt, 


—1.383 
where 


{i= wee (Kenney, 1947, p. 116 ff.). 
TT 


From the table for the normal curve, 
[| _ s@ at = 0.4166 + 0.5000 = 0.9166. 
—1.383 


The total number of beetles in the spotted population is calculated 
from 0.9166 N = 822. N = 896.8 or 897 and the number of beetles in 
the spotless population is 1011 — 897 = 114. The standard error of 
N is estimated by replicate calculations of N from the 10 random 
samples of 1011 elytra each; sy = 14. 

To impose the normal curve on the histogram in figure 2, values of 
x were substituted in the equation ¢ = (w — m)/s and the corresponding 
standard-unit ordinates read from the table for the normal curve. 
These were converted in turn by substitution in the expression y = N- 
f(t)/s, where f(t) is the ordinate (Kenney, 1947, p. 125). The solutions 
for y are plotted in figure 2 against the appropriate values of 2. 

The goodness of fit of the experimental data to the right of the 
point of truncation was tested by x” = 13.02, df. = 9, P = 0.17. 

At the suggestion of Wright (1952), a semilog parabola was fitted 
by least squares to the data after plotting the log frequencies in table 1 
against the classes. The goodness of fit was tested as before: x” = 25.94, 
d.f. = 10, P < 0.01. Obviously, the normal curve proved a better 
fit to the data than the semilog parabola.* 


The Distribution of Spotless Beetles 


After removing the spotted beetles from the mixed population by 
either of the above two methods, the remaining spotless beetles assumed 


*Dr, G. E. P. Box has pointed out that, if the log frequencies are suitably weighted in a least squares 
analysis, this procedure should give improved estimates of the mean and variance which would yield a 
smaller x? for the goodness of fit test, 


—" 
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a uniform distribution, except for a mode at class zero (bgo 2); Chis 
mode could have been produced by the necessary truncation at this 
point. In contrast to Shull’s evidence (1944) that his laboratory 
population contained some four modifiers of the gene Spotless, Wright 
(1952) commented that the present distribution of spotless beetles 
would not be expected if variability depended on “some four more or 
less equally frequent and equally important modifiers. There is some 
suggestion of one main modifier or, perhaps more probably, an indication 
of such inequality in the physiological significance of scale units in 
this region that no interpretation is warranted.” 


The Question of Gene Frequencies 


Shull (1944) applied his criterion of the spotless phase to several 
widely distributed populations of Hippodamia convergens. Using the 
Hardy-Weinberg equilibrium formula*, which disregards systematic, 
random and nonrecurrent changes in gene frequency (Wright, 1949), 
the frequencies of the gene Spotless for each population were calculated. 
Further, assuming four partially dominant, equally abundant and 
effective modifiers of Spotless, the frequencies of the modifiers were 
calculated**. 

Similar computations with the present data yielded a frequency of 
Spotless of 1 — (897/1011)? = 5.81 per cent, which compares favorably 
with Shull’s values of 5.84 and 6.45 for two Michigan populations. 
However, such statistics are only rough approximations, because of the 
large phenotypic overlap, the disregard of such factors as mutation and 
selection pressures, and the indirect evidence relating to the inheritance 
of the spotless phase in nature. As was noted earlier, genetic drift is 
common in small laboratory stocks and, therefore, extrapolation to 
natural populations cannot be made with a high degree of confidence. 

On the other hand, genetic drift would not be expected a priori in 
natural populations of Hippodamia, according to the known ecology of 
the genus (Cutright, 1924; Park, 1930; Balduf, 1935). Although hiber- 
nation occurs in small restricted groups (Allee ef al., 1949, p. 538), mating 
does not occur until these groups have dispersed in the spring; thus 
genetic drift would be held to a minimum and neighboring populations 
would not be expected to differ greatly in gene frequencies. 


~ <a 


i=1 


aT 2 
| > «so | , where qi is the gene frequency of the spotless gene S1 . 


*k1 — (Po /P)*! 8 where Po is the number of beetles of class zero (table 1) and P, is the total number f 
of spotless beetles. : 
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SUMMARY 


A natural population of the lady beetle Hippodamia convergens 
Guer. was submitted to a statistical analysis of its elytral pattern in 
order to separate two overlapping phenotypes. These phenotypes were 
the so-called “‘spotless’”’ and “spotted” phases, the inheritance of which 
had been investigated earlier by A. F. Shull. It was shown that the 
three posterior elytral spots, as a group, tended to be absent more 
frequently than the anterior three, and also tended to be larger in size. 
A normal curve was fitted to the arbitrarily truncated frequency distri- 
bution of the total of the greatest diameters of the three posterior 
maculations of spotted beetles. The normal curve gave an acceptable 
fit as tested by chi-square. From the statistics computed, the popula- 
tion sizes of the spotted and spotless beetles were calculated. The 
question of gene frequencies in this and other populations was discussed. 
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THE CHAIN BLOCK DESIGN 


W. J. YoupEN anv W. S. Connor 


National Bureau of Standards, Washington 25, D. C. 


1. Introduction. The development of the design of experiments 
has been inspired largely by the needs of agriculture and biology. 
The first important steps were taken by R. A. Fisher and F. Yates, 
both of whom were on the staff of the Rothamsted Experiment Station. 
The methods devised by these workers are widely used in biology and 
agriculture and, to some extent, in other sciences. 

We believe that one of the reasons for the delay in the adoption 
of experimental designs in the physical sciences is that the classical 
designs often do not meet the experimental situations encountered in 
these sciences. In fact, when one compares the experimental conditions 
which exist in the biological and agricultural sciences with the ex- 
perimental conditions of the physical and chemical sciences, one is at 
once struck by certain fundamental differences. A difference which 
is of paramount importance is the magnitude of the errors of measure- 
ment in the two areas. In agricultural and biological experimentation 
the basic experimental material often is land or animals, and the varia- 
tion over a field or between litters is likely to be large. To compensate 
for this large variation, the classical designs require repeated measure- 
ments to effect a reduction in the error of the estimates. 

Physical measurements can be made with high precision and the 
experimental material usually is relatively homogeneous. It follows 
that it is not necessary to put great reliance on replication in order to 
achieve good results. Excellent estimates may be obtained from one 
measurement, or at most, two or three. 

It is with the needs of the physical and chemical sciences in ‘mind — 
that we offer the Chain block design. This design is very flexible, 
and the number of replications is small. Thus, for a set of quantities 
to be compared, this new design calls for some of them to be measured 
once, and the others twice. 
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2. Definition of the Chain Block Design. Let us begin by considering 
a simple example of a Chain block design. Suppose 10 quantities are 
to be compared. Denote the quantities (traditionally, and hereafter, 
called treatments) by letters and arrange them in three blocks as fol- 
lows. 


BLOCK 


1 2 3 

A C E 

B D F 

G E A 

D PF B (2.1) 
a b Cc 

d e€ 


We observe that the treatments which are denoted by capital letters 
are replicated twice each, whereas the treatments which are denoted 
by small letters are replicated only once each. Further, A never occurs 
in a block without B, nor C without D, nor HE without F. Thus, the treat- 
ments which are replicated twice fall into 3 groups, and these groups 
are the links in the chain of blocks. Treatments C and D link blocks 
1 and 2, EF and F link blocks 2 and 3, and A and B complete the chain 
by linking blocks 3 and 1. 

In general, there are v treatments which are arranged in 6 blocks. 
A treatment is said to belong to class C, if it is replicated once, or to 
class C, if it is replicated twice. There are v, treatments in C, and 
v, treatments in C,. 

The treatments of C, are divided into b groups of n,; treatments 
each, (7 = 1, --- , b), where a group is characterized by the property 
that the treatments of the j-th group occur in the j-th block, and no- 
where else. 

Similarly, the treatments of C, are divided into b groups of n, 
treatments each, (n2 > 1), where a group is characterized by the 
property that if a treatment of the group occurs in a block, then the 
other n. — 1 treatments of the group also occur in the block. Further, 
blocks 7 and j + 1 have in common the treatments of the j-th group, 
(j = 1, --- , b; mod 6), but no other treatments in common. No other 
two blocks have any treatments in common. 

We shall denote the number of units in a block by k; and let, 

i-1 k; = N. From the above, the following relations hold: 
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V1 “ 2. = v, 
0, + 2, = N, 
b 
mn = 01, (2.2) 


7=1 
bro =.05.5 and — 
1; + Qn, = k; . 


If we let G;; , @ = 1, 2;7 = 1,---, b), denote the treatments of 
the 7-th class which are in the j-th group, then we may display the 
Chain block design as follows: 


BLOCK 
ohare: b 
Gis + Gu Gn Goo) Sere 
i cee ck Ge UR ae 


Gi Gis Gis ——— Gy, 


Since the average number of replicates is (v, + 2v.)/v, we may 
refer to the Chain plot design as a “(v, +20) /0 replicate” design. 
Of course, 

1<@+ 20,)/v Sele 2.4) 


8. Some Practical Aspects of the Design. Perhaps the most striking a 
Peas site tite, flexibility, Ge eg Siew ae 
C; and C, a h cen erimenter, as we 

dv = = 2, and let there be 
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In this design, v, = 14, »» = 8, N = 30, and there are 5 degrees of 
freedom for error. The number of degrees of freedom for error in the 
general case is given in Section 4. 


DESIGN 2: BLOCK 
1 gees 
A D G ye 
B E H i 
C F ig L 
D G J A 
E H K B (3.2) 
Pi fs L C 
m D 8 u 5 
n qd -t v : 
0 r 


In this design v, = 10, 
freedom for error. 


= 34, and there are 9 degrees of 


s 
| 
— 

meh 
= 


. DESIGN 3: BLOCK 
: | Wake eae es 
A E Z M 
: Ee jg =A J N 
6) G K O 
chara os Rs Stes mee 
+ Fidel 3 No Sh ee 
EL SE ee Cg eS 2 ON can) SMR pw 
f ee oa ort ae i ae “ty ag 
ata ee ae NG ¥en rs ~ Ao 
( ey 
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experimenter will have to decide whether there are approximately 
8, 12, or 16 treatments which should be estimated with the smaller 
variance. Again, the experimenter must decide whether he needs 
5, 9, or 13 degrees of freedom for error, and this decision will depend 
in part on the magnitude of the variance of a single determination or 
yield. 

Although the definition of the Chain block design does not specify 
the distribution of the treatments of C; among the blocks, it seems 
desirable to distribute them as evenly as possible. For example, in 
Design 1 we have put either 3 or 4 treatments of C, in every block. 

Now consider the difference between the estimates of any two 
treatments. The variance of this difference will depend on where 
the treatments are in the design. For example, consider the treatments 
of C, in Design 1. If the treatments are in the same group, as A and 
B, the variance of the difference between their estimates is the smallest 
variance in the design: 

To explain this point further, let us think of the groups as arranged 
ina circle. Thus, around the circle we have groups 1, 2, --+ , b, with b 
followed by 1. Now consider any two groups of C, , say j andj’. Let 
the number of groups which lie between 7 and 7’, when counted in the 
shortest direction around the circle, be p. Then, the variance of the 
difference between the estimates of a treatment of 7 and one of 7’ varies 
directly with p. Thus, in Design 1, we have a larger variance for the 
difference between A and C than for A and B, and a still larger variance 
for the difference between A and EZ. Similar statements can be made 
for treatments in C, , or for one treatment from C, and the other from 
C,. The experimenter should, in so far as it is possible, be guided by 
these considerations when assigning treatments to groups. : 

4. The Analysis in General. The analysis which],we “shall give 
below can be rigorously justified, for example, by the theory of§{1]. 

Let the typical yield be denoted by 2z;;,. , where the first index 
refers to the class, the second to the group, the third to the treatment, 
and the fourth to the block. Thus, 7 = 1 or 2;7 = 1, --: ,b;u=1, 

-,n;;;andz=jorj +1. Let ¢;;, denote the effect of the u-th 
treatment of the j-th group of C; , and b, denote the effect of the z-th 
block. Then we assume that 


Lijuz = bijn ae b, of €iju) (4.1) 


where ¢,;, is a random variable with mean, zero, and with variance, 
o°. Further, we assume that e;;, is independent of the corresponding 
random variable which is associated with any other yield. 

Let the least squares estimate of a treatment or block effect be 
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denoted by putting a circumflex (-) over the corresponding eure 
Thus ia is the estimate of ¢;;, and b. is the estimate of b. 
Also, let 


Xe = So tye and Dy = Xai — Xascen - 
Then by imposing the restriction, 
> b, = 0, (4.2) 
we may estimate the effect of the j-th block by 
Ont), = (= Y— DD — Did, 48) | 


where a is the largest integer which is less than or equal to (6 — 1)/2. 
In (4.3) and in the formulas below the subscripts should be reduced, 
mod b. 

The treatment estimates are easily found by using (4.3). Thus, 


, eS ein — b, (4.4) 
and as : | 
Dion = Lojuj a Vojuci+1) — b; ~~ bee ae (4. 5) 


To carry out the analysis of variance, we first compute He sum a 
squares due to error, which we shall denote by S*(e). For the j-th 
group of C, we form the differences, : 


hie =D Renee )s @u=1,-- Pips (4.6) 
Now dj, is an n estimate of the difference pelvis the j-th and (j + vn 
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The remaining degree of freedom is given by 


Y= 1/(2im)| Xo0(241) - > Boalt (4.8) 
Hence, 
Se = a Sie) + ¥- (4.9) 


We now compute the sum of squares due to blocks, uncorrected 
for treatments, and the total sum of squares; and then obtain the sum 
of squares due to treatments by subtraction. If we let B; denote the 
sum of the yields in the j-th block and @ denote the sum of all of the 
yields, then the sum of squares due to blocks uncorrected for treat- 
ments is 


b 
Sb) = 2) Bi/k; — @/N, (4.10) 
; pire ) . 
and the total sum of squares is 


SOy= 2) = CIN ——=— (4-1) 


where the summation is over all values of the indices. 
We obtain the sum of squares due to treatments by subtraction, thus: 


S(t) = S(T) — S— — S*(b). (4.12) 
The analysis of variance table is as follows: 


ANALYSIS OF VARIANCE TABLE I anes 
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and compute the quantities, 


S3(b) = [x B:; — G3/ | (4.18) 
and 
SP) = D> tries —-Ga/2med, (4.14) 


where the summation is over all values of the indices. Finally, the 
sum of squares due to treatments of C, , corrected for blocks, is 


S2(t) = S3(T) — S*(e) — S2(0), , (4.15) 
and the corresponding mean square is 
sx(t) = S2(t)/@2 — 1). (4.16) 


We may want the sum of squares due to blocks corrected for treat- 
ments. This quantity can be found as follows. We compute the 
quantities 


Pe ey 3(D. Sa Diy: (4.17) 


(2g = 1, -:- , b), and then find the sum of squares due to blocks cor- 


rected for treatments, S’(b)’, by 


Sb)’ = ys bP. ~ (4.18) 


The sum of ener due to treatments, af ate for blocks, 
S*()/, is found by subtraction, thus, — 
- SO! = ST) — S%(b)’ — SO. — (4.19) 
oe osama sol penile i, Sol 
RE ae ANADTSIS OF VARIANCE "ADL : aaa 


Sgt es CF 
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If we assume that the errors are normally distributed, we may 
carry out the usual F and ¢ tests. It should be noticed that the ratio 
of s(t) to s°*(e) is F with (v. — 1) and (NV — v — b + 1) degrees of 
freedom. 


Let 


(22 — 1)b —W 
Nob : 


f@=14 


Then for any two treatments of C. , say t;, and to. ; 


V (loin Pm; a = <) 4 = 8; or a [f(2) Jo”, J * 8; (4.20) 
and for two treatments of C, , say t,;, and t,, , 


V(triw — tow) = [f@ + (1 + 1/m)]o’, (4.21) 


where 7 = j — s, mod b. For a treatment of C, and a treatment of 
Oe, BBY fy ADU. , 


Vide — be) = 5 (s + te L) J =8, or 


bela) le js, (4,22) 


where 7’ = j — s, mod (6 + 1). 

5. An Example. It is characteristic of many experimental situations 
in the physical sciences that the block is sharply defined. This con- 
trasts with the arbitrary designation of a given land area as a block in 
agricultural field trials. For example, spectrographic determinations 
of the chemical elements may be carried out by the comparison of 
spectrum lines as recorded on a photographic plate. A limited number 
of exposures may be made on a plate which is then developed. Ob- 
viously, all determinations on one plate experience the same processing, 
and comparisons within a plate have been demonstrated to be more 
precise than comparisons between samples run on different plates. 


When the number of samples exceeds the capacity of a plate an oppor- _ 


tunity is presented to use an arrangement to correct for block effects 
and to do this with a minimum amount of replication. The example 
chosen concerns a study of the nickel content of a large number of 
rods prepared from the same ingot. The study was made by B. F. 
Scribner of the Spectrochemistry Section of the National Bureau of 
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Standards. It is desired to detect possible differences in composition 
The rods were made from a selected portion of the ingot to insure a 
high degree of uniformity among them. 

In the example, b = 3, », = 30 and v, = 12. Hence, the design 
is a ‘12 replicate’’. 

We shall let iju, (¢ = 1, 2;7 = 1, 2,38;u = 1, --+ , mz 5m; = 10 
or 4 according as 7 = 1 or 2) denote the u-th treatment of the j-th 
group of the 7-th class. Then the treatments occur in the blocks as 
follows: 


BLOCK 
1 2 3 

231 211 221 

232 212 222 

233 213 223 

234 214 224 

211 is 231 

212 222 232 - 

213 hs 223 233 (5.1) — 


a 
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. 1 2 3 
. Symbol Amount | Symbol Amount | Symbol Amount 
eee tas = 4 Vict 
Xa = 7 L222 = 3 T2023 = O 
Xo331 = 14 Xo132 = 10 — Xo233 = —3 
Xo3u1 = 9 Xo140 = 6 Xo2a3 = —8 
Loi = 13 Leon = 5 X13 = 1 
Zein = 15 Leo02 = 7 Xo33 = 5 
Leisi = 12 Xoo32 = 2 oas3 = 2 
Lois = 9 Xooa2 = 6 Tosa = 0 (5.2) 
pig = 1 Pine = 10 Xuan =" 5 . 
Dio = 5 ioe = _—_ 1323 = —1 
tus = 17 Tae = 6 Sis = —3 
Lia = 14 __ isis = ue Bos ‘ Lisag = —6 
fuci = 12° Lies = 6 * Hia5p = 2 
te = 18 Lives = 4 Lises = —2 
: oes ss 


fun = 14 hie = 7 Vis73 = 
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dy, = 9 dy, = 6 d3, = ai 


is = 12. Ose = if day — —2 
Bris oa 2, hes — 5 Obs: — —12 
dus ad 3 (dln = 14 hen = —9 (5.5) 
4 : 4 4 
diy = 26 >, des = 32 yd = 30 
u=1 u=1 u=1 
4 4 4 
> di,, =:238 2 ds a= 306 >) di, = 278; 
u=1 =1 u=1 


3 
By = 87, Bu = 48, Bo = —4, G, = 126, >> Bi; = 9,484; (6.6) 


7=1 


3 
By = 214; By = 118, B, = —8, G= 324, >) B; 


pa 


59,784; (5.7) 
and 
potas = 18885 para 3, SU, (5.8) 


where the summations are over all values of the indices. 
We estimate the effect of the first block by (4.3) and (5.4), thus, — 


24), = 2(56) = 412, or ae | 
6, = 14/3. 6,988 


similarly, 4, = 1/2 and $, = —31/6. The cum-of these estimates 
is zero, as it should be. 
ne ae of the effect of ti: is found from. (4.4) and (5.2) to be 4 


8 fu = [Pes 19/3 me (5.10) 
ti A eatamient eee sen 
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so that by (4.9), 
S*(e) = 118.67. (5.14) 


From (4.10) and (5.7), the sum of squares due to blocks is 


2 a, a0 a si 2 
S*() = 7g (59,784) — =; (824) 
= 3,321.33 — 1,944.00 (5.15) 
= 1377.33, 


and from (4.11), (5.7), and (5.8), the total sum of {ieee is 
S(T) = 3,862 — — oq (324)? = 1,918.00. wie (5:16) 


Hence, by (4.12), i. 14), (5.15), and (5.16) the s sum of squares Bie to 
treatments is — 
S*(é) = 1,918 00 — 118.67 — 1,377.33 = 422, 00. a7 


The analysis of Variance Table I is-as-follows: 


ANALYSIS OF VARIANCE cp. I 


| Deeress ot | Sum of Mean 


Treatments (corrected for blocks) | 41 | 422.00 | aoe 
Blocks (uncorrected. 2 1377.33 | Nee Ze 


ected) ener se 
: : st ig) ig ee 5 Waa alse 
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Hence, by (4.15), (5.14), (5.18), and (5.19), the sum of squares due to 
treatments of C, , corrected for blocks, is 


S:(t) =-726.50 — 118.67 — 517.75 


= 90.08, (5.20) 
and by (4.16) the corresponding mean square is 
s(t) = 90.08/11 = 8.19. (5.15) 
From (4.17), and (5.4), 
P, = 3(26 + 30) = 28.00, (5.16) 
P, = 3.00, and P; = —31.00. Thus by (4.18) and (5.9), the sum of 
squares due to blocks, corrected for treatments, is 
S*(b)’ = 292.33, (5.17) 


and by (4.19), (5.14), (5.16), and (5.17), the sum of squares due to 
treatments, uncorrected for blocks, is 


S*(i)’ 


I 


1,918.00 — 118.67 — 292.38 
1,507.00. (5.18) 


If the reader desires to do so, he may write out the Analysis of 
Variance Table II. The F ratio for blocks, corrected for treatments, is 


F-= 12.31, (5.20) 


which is significant at the .01 level of significance. Thus, we have 
been wise to remove the error due to blocks. 

Since the treatment mean squares are not significant and in fact, 
are less than the error mean square, we might decide not to carry out 
¢ tests. However, for illustrative purposes, we shall compare the 
effects of 213 and 224. From (4.20), we find that the variance of the 


difference, (213 — tos), is (18/12)o°. Hence, using our estimate of 
o’, that is, s’(e), we obtain 


dial tiaras tr SLU Get OD lie 
' = 43/12)"s@) ~ 3.731 = 1.00, oe) 


which is not significant. 
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Design. 

Recently, the author (1950) called attention to the importance of 
enumerating partially balanced incomplete block (p.b.i.b.) designs 
involving as few as two replications of each treatment. In that and a 
subsequent note (Nair, 1951) several examples of p.b.i.b. designs 
involving only two replications and having either 2, 3 or 4 associate 
classes were given. Bose (1951) made an exhaustive study of two- 
replicate p.b.i.b. designs having 2 associate classes. 

One of the two-replicate p.b.i.b. designs given in the author’s (1950) 
paper consisted of a design having the parameters: 


et PG kp SN) rt = 2 ep 
i= 1 A. = 0 
™m = 2p— 2)  n, = 1/2 (p — 2)(p — 3) 


; as (p — 3) 
Ligeas 
@-—3) 1/2@-3)@-4) 


2(p — 4) 
2 
(Vn = 
2p — 14) Sh 2 lp aD) 
This design is constructed by dualising the unreduced balanced __ 
incomplete block design in blocks of 2 plots for which the parameters 
y*, k*,-r*, b* and d* are: 


y¥ =p, id tat Pd e* = (p— 1); b* = (1/2) p(p — I), rx* = 1. 


uj) *Address: Forest Research Institute, Dehra Dun, India. 
141 
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The dual design v = b*, k = r*, r = k*, b = v* has the property 
that there is just one treatment common to every pair of blocks. It 
therefore belongs to the class of designs which Youden (1951) has 
recently used under the name Singly Linked Blocks. 

Bose (1951) has shown that it is easy to write down the blocks of 
the design by using the scheme given below, illustrated for the case 
p = 5. The blocks are given by the rows (or columns). 


etd Hee Rae 
ber id Oren 
aie peek 
Say Cae LU 
AiO LO es 
SCHEME 1 


Thus, the five blocks of the design are: 
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The following diagram illustrates the method for p = 5. 


i: 


FIGURE 1. 


When p is odd, the configuration can alternatively be presented 
artistically (like a star), as illustrated below for p = 5. 


1 


FIGURE 2. 


One of the advantages of the scheme devised by Bose is that, in one 
stroke, it gives a simple method of constructing the design as well as a 
simple rule for picking out the first and second associates of each treat- 
ment, Thus, in Scheme 1, all treatments whose numbers appear in the 
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same row and column are first associates of each other. For instance, 
the 1st associates of treatment no. 1 are 2, 3, 4, 5, 6 and 7, while its 
second associates are 8, 9 and 10. : 
There is another way in which the association scheme of the design 
can be presented. Let us denote each treatment by two numbers 


(i, j) where 7 takes the values 1, 2, --- (p —1) and 7 takes the values 
i+1,i+2,+--, p, fora particular value of 7. For instance the treat- 
ments 1, 2, --- 10 of Scheme 1 may be written as follows: 


(12) (13) (14) (15) 
(23) (24) (25) 
(34) (35) 

(45) 


SCHEME 2 


Treatment (77) occurs in the 7th and jth blocks of the design. Hence, 
treatments with one digit in common are Ist associates and treatments 
with no digit in common are 2nd associates. 

Bose and Shimamoto (1952) have recently found that the assocza- 
tion scheme of a number of p.b.i.b. designs having 2 associate classes, 
for which v = 1/2 p(p —1) but k not necessarily equal to (p —1), can 
be represented in the same way as for the design: v = 1/2 p(p —1), 
k = (p —1), illustrated in Scheme 1 for p = 5. They have given this 
type of p.b.i.b. designs the name: Triangular Type. In particular, the 
design for v = 1/2 p(p —1) and k = (p —1) has been classified by them 
under the name Triangular Singly Linked Blocks (TSLB). 

It is fairly obvious that Scheme 2 can be used to represent the 
association scheme of any triangular type design. 

If the basic TSLB design involving only two replications of each 
treatment is repeated, say, / times in the experimental lay-out, so that 
there are / basic groups each having p blocks of (p —1) plots, the result- 
ing design may be considered as a partly resolvable p.b.i.b. design 
having the following parameters: 


t=AliZipip— 1), k= @— 1), r= 21, d= ip, 4 = 1, tee 


and 7; , N2 , p;, remaining the same as in the basic TSLB design. 
The method of analysis for both the cases will be discussed in the 
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next two sections and a numerical example for the analysis of the basic 
design given in the last section of the paper. 


Analysis of the Basic TSLB Design. 


The statistical analysis of the data of an experiment laid out using 
triangular singly linked blocks can be performed by direct substitution 
in the general formulae for p.b.i.b. designs having 2 associate classes 
(See Nair, 1952). By using association Scheme 2, the resulting ex- 
pressions for estimates of treatment effects, their variances and the 
components of the analysis of variance can be very much simplified. 

The values of the quantitative character (x) under study for the 
_ two replications of treatment (77) will, for the convenience of analysis, 
be denoted by z,; for the plot belonging to the 7th block and by 2;; for 
the plot belonging to the jth block. The whole data can then be pre- 
sented as in Table 1 below. a. 
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Let 7; denote the total value of x for treatment (77). Then T= 
t,; + x,;;. Values of 7';; may be presented as in Table 2 


T 12 T 13 T 14 ae Tp 
T 03 T 04 yo Top 
Tp-1,p 
TABLE 2 


The general formula (37) for estimating the effect of a treatment, 
with recovery of inter-block information, given in the author’s (1952) 
paper simplifies for this particular design to the form 


peeled 
= 12 1, a wi (a. a sy) sm (a;. = A aS to p(p ris 5 | (1) 
where, 

(w — w’) 

; 2 

po + @ — Dw ‘i 

The adjusted treatment total with recovery of inter-block informa- 
tion is therefore, 


fea! 


PPE oA Vs i (3) 


Dividing by 2, we get the corresponding adjusted treatment mean. 

To get the corresponding intra-block estimates, we have only to 
substitute w’ = 0 or uw = 1/p in (1) and (8). 

To estimate w and w’ and from them the value of u we have to 
perform the analysis of variance on the data of Table 1. 

The total sum of squares, the treatment sum of squares (unadjusted 
for block effeets) and the block sum of squares (unadjusted for treatment 
effects) are calculated in the usual way. They are: 


Pp pot 2 
Total 8.8. = eee eo 
pte 2, tds & 
: ie 1 a ee 
Block 8.8. (unadj.) Ea PE, ne on (5) 


Treatment 8.S. (unadj.) = (1/2) Ss Sy (he 


Merri eke a aie = (6) 
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It is easier in this design to first calculate the sum of squares for 
blocks (adjusted for treatment effects) than to calculate the sum of 
squares for treatments (adjusted for block effects). The former is 
given by the expression 


Block $.8. (adj.) = 3- Y(t. — 24)? (7) 
iP n=1 
The latter is then calculated by subtracting (5) from the total of (6) 


and (7). 
The analysis of variance is presented in Table 3. 


———o 


TABLE 3. ANALYSIS OF VARIANCE 


Source of Variation | Degrees of Freedom| Sum of Squares ig Mean Square © 
Blocks (adj.) gest} Sos focally), er J 
Treatments (unadj.) |(1/2)(p + 1)(p. — 2)| See formula (6) 
Intra-block error (1/2)(p — 1)(p — 2) (4) — (6) — (7) ae J 
Total (p? —p —1) | See formula (4) 
(p — 1) See formula (5) 
(1/2)(p + 1)(p — 2); (6) + (7) — () | E; 
The v variance ratio F=E#E,/E, provides ; a test of significance of inter- oa 
block estimates of treatment effects. ares a 


The author (1944) had shown that for a ‘a sadommplete 
q as the tes of w and w’ en w 
; wt SSAC, ana aint uy TOe fad ER? aie R 

{4H 5) 5 ==: As a 
nies on 2 
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those pairs which occurred together in the same block (..e., those treat- 
ments with one digit common) and those pairs which did not occur 
together in the same block (i.e., those treatments with no digit common). 

(i) Variance of the difference between two treatments with one 
digit common can be obtained by direct substitution in formula (39) 
of the author’s (1952) paper. It simplifies to the form 


1 
=, ie) (11) 


(ii) Variance of the difference between two treatments with no digit 
common can be obtained by substitution in formula (40) of the author’s 
paper. It simplifies to the form 


° 


il 
FAC aa 2u) (12) 


(iii) The mean variance for differences for all pairs of treatments is 
derivable from formula (41) and simplifies to the form 


1 2(p — 1) | 
slit eau a) 


By substituting » = 1/p in (11), (12) and (18) we get the correspond- 
ing variances for intra-block estimates of treatment effects. 


Analysis when the Basic TSLB Design is Repeated Several Times. 


Let the basic TSLB design be repeated / times in / compact groups 
of p blocks each in the field, the p blocks of each group being inde- 
pendently randomized. In every basic group the blocks will be numbered 
in such a way that block no. (h) will be the one containing the (p —1) 
treatments (1, h), (2, h), --- (h —1,h), (h,h + 1), --- (A, p). These 
treatments will be randomized within this block. 

Let x;;, and x;,, be the values for treatment (77) in the plots allotted - 
to it in block numbers (7) and (7) of the kth basic group. Let 


1 I 
Cie = os Viik 5 Lj = SS Uiik + 

k=1 k=1 
We have to set up a table of values of x;;, and z;;. similar to Table 
1 and calculate the marginal totals x,.. and x.,, ; and the grand total 
x,.,. The total for treatment (77) will be T;; = 2:3, + 2;,. . =i 
The total for block no. (h) of the kth basic group will be x,., and the 

total for the group will be z.., . 


Estimate of effect of treatment (ij) with recovery of inter-block 
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information is 


4 1 ve : . . pens 
= aI E — pi(zy.. — 2...) + (a;.. — 2.;.)} — ee (14) 


where u has the same value as in (2). 
The corresponding adjusted treatment total is 


Die Mihi. — 2.2) (23.0 = 2;)} (15) 


Dividing (15) by 21 we get the corresponding adjusted treatment mean. 

To obtain intra-block estimates we have to replace u by 1/p. 

In the analysis of variance the (lp-—1) degrees of freedom for 
blocks can be split up into three components of (J —1), (p —1) and 
(1 —1) (p —1) degrees of freedom. The first and third components are 
orthogonal to aaa comparisons. Their sums of squares are 


i ie; 
eT ie (22! rs iat.) (16) 
I~. Iv, ites 
ae Sa fod. - yet pe.) (17) 


The expectation of (16) involves w, w’ and the inter-group variance 
between blocks of size p(p —1). It is therefore not available in esti- 
mating w’. 

The expectation of (17) is (J —1) (p —1)/w’. 

The second component is non-orthogonal with treatment com- 
parisons. Its sum of squares after eliminating treatment effects is 


1 < 2 
2lp ds (Xp-- ae Lon) (18) 


Using the results of Table VIIB of Bose and Shimamoto’s paper 
we can show that the expectation of (18) is 1/2[(p)/w’ + (p — 2)/vu]. 

To get the best estimate of w’ we should combine (17) and (18). 
This has been done in the analysis of variance shown in Table 4. 

Estimates of w, w’ and y» in terms of FH, and E£, are given below 


1 
= ue 
Pe eG ent 8 ig oS ort 
CS 29 ol), (pees oH) 
lH, — E.) (21) 


+= pe, + U— Do — DE, 
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To calculate the variances of differences between adjusted treatment 
means with recovery of inter-block information we have to divide (11), 
(12) and (13) by 1. 


Numerical Example. 


The data used in this example were made available to me by Dr. 
W. J. Youden of the Statistical Engineering Laboratory, National 
Bureau of Standards, Washington, D.C. from a Technical Report 
(unpublished) by John E. McKinney, George E. Decker and Frank L. 
Roth of the Bureau’s Polymer Development Branch, Research and 
Development Division. 

Physical properties of samples from 10 bales of a particular brand 
of synthetic rubber were measured in order to determine values of the 
properties for the lot and to measure the uniformity of the material 
throughout the lot. 

Two compounded batches were prepared from samples of each of 
the 10 bales. Simce there was not sufficient material from any one 
bale to make more than two tests and since it was not possible to test 
all the bales in a single day, the scheme in Table 5 was used. 


TABLE 5. ALLOCATION OF BALES 


Day Bale Number 
(1) 1 2 3 4 
(2) waer ten 5 6 7 
(3) 2 5 8 9 
(4) 3 6 8 10 
(5) a id a 10 


It will at once be noticed that if ‘days’ and ‘bales’ denote ‘blocks’ — 


and ‘treatments’ respectively, the scheme in Table 5 is identical to 
Scheme 1 for linked block designs for the case p = 5 discussed earlier. 

Table 6 shows the values for one of the properties, namely, strain 
at 400 psi. Each value is the median for 3 specimens from a vulcanized 
sheet. 
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TABLE 6. DATA 


Day Strain (%)* at 400 psi 
(1) 35 20 13 25 
(2) 16 16 21 27 
(3) 10 5 20 15 
(4) 26 24 37 31 
(5) 21 16 20 17 


*Coded—300 


Using Scheme 2 for numbering the bales and Table 1 for presentation 
of the data we shall rearrange the data of Table 6 as shown in Table 7. 


TABLE 7. RE-ARRANGED DATA WITH MARGINAL CALCULATIONS 


Block cy od ‘| | Total | 
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The treatment totals are tabulated below: 
(12) (13) (14) (15) 


bl 30 30,46 
(23) (24) (25) 


21 45 48 
(34) (35) 
Hieoo 
(45) 
48 
(a) Total sum of squares = (35)? + --- + (17)? — oe = 1207.75 
(b) Block sum of squares rn (93)? + +--+ + (74): _ (415)? Ber a 
(unadjusted) 4 20 aoe 
(c) Block sum of squares =e atin 
(adjusted) 20 a ih Ge) Ts, 50 
(d) Treatment sum of = Cian + C38 Sib), 504.25 
' squares (unadjusted) =~ cs :. 20 
(e) Treatment sum of — a ee = 481.6 
squares (adjusted) () + @d) (0) ; 
(f) Intra-block error = SPs Re ae 
sum of squares Ose eI ORICON 
= 20010 et Pin 


Je Those sums of squares can now be presented inthe following tab ~ 
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The variance ratio F = H,/E, = 1.27 is not significant. 
Estimate of yu is 


- = 0.1470 


Each value of x in the hth block may now be adjusted by subtracting 
from it either u(z,, — v.,) or 1/p (a. — %.,) according as the adjustment 
desired is with or without recovery of inter-block information. ‘The 
adjusted values for the former case are given below. 


Block Number 
(1) a 32.06 17.06 10.06 22.06 
(2) 16.00 bs 16.00 21.00 27.00 | 
(3) ee Gece ines? . 26.32 21532 
(4) | 19.09 17.09 30.09 . 24.09 
(5) 24.538 19.53 2an0e 20.53 * 


The corresponding adjusted treatment totals are: 
(12) (13)2 (14) (15) 


48.06 33.38 20.15 46.59 
(23Yaih 2S} dee (2B ge a 


p 8 Ligh aes 
‘ 
owe dnolingetal fyi 
’ ¥ - 1 ¥ 
“Sy re sult | 
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In these triangular tables the easiest way to spot 1st associates is to 
remember that any two treatments with one digit in common would 
have occurred together in the same block e.g. treatments (12) and (25) 
occur together in block no. (2). Those with no digits common would 
not have occurred in the same block and are 2nd associates e.g. treat- 
ments (12) and (34). 

(a) The variance of difference between means, adjusted with 

recovery of inter-block information, of two treatments occurring 
together in the same block (lst associates) is given by 


= (1 + u) = 33.385 X 1.147 = 38.25245 
The lowest significant difference between adjusted means at 
5 and 1% levels are 

L.8.D:.o5 2.447 KX V38.25245 = 15.13 

LS.D..o. = 3.707 X /38.25245 = 22.93 


(b) The variance of difference between means, similarly adjusted, 
of two treatments not occurring together in the same block 
(2nd associates) is given by 


I 


(1 + 2u) = 33.385 X 1.294 = 43.15490 
The lowest significant difference between adjusted means at 
5 and 1% levels are 

LS.D..o5 = 2.447 K V43.15490 = 16.07 

LS.D..o, = 3.707 X V43.15490 = 24.35 


(c) The mean variance for differences for all pairs of adjusted 
treatment means is 


ify re 4.) = 33.35 X 1.196 = 39.88660 
If we are thinking of using one common L.S.D. for every pair 
of adjusted means, its value for 5 and 1% levels are 

LS.D.o, = 2.447 X V39.88660 = 15.46 

LS.D...; = 3.707 X V39.88660 = 23.41 


To get corresponding values of L.S.D. for intra-block adjusted 
treatment means we have only to replace u by 1/5. These L.S.D. values 
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are strictly valid, but not the L.S.D. values calculated above for means 
adjusted with recovery of inter-block information owing to the fact 
that u is estimated from the data and not known a priorv. 

Finally, I wish to express my thanks to Professor Gertrude M. Cox 
for her continued interest in this work which was done at the Institute 
of Statistics, The Consolidated University of North Carolina, Raleigh, 
during the tenure of a joint Fulbright and Smith-Mundt Fellowship 
awarded to the author by the U.S. Government. My thanks are also 
due to Dr. W. J. Youden for supplying me the experimental data for 
the numerical example. 
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SPLIT-PLOT HALF-PLAID SQUARES FOR IRRIGATION 
EXPERIMENTS’ 


WALTER C. JAcoB? 


The rapid developments in the field of experimental design have 
provided many designs which appear to be extremely complicated. 
It seems desirable to publish examples of the use of various designs 
so that other research men will have the benefit of knowing how the 
design actually worked in practice. It is the purpose of this paper to 
present an example of the use of a complex design in vegetable crops 
research. The reasons for the choice of the design and the various 
steps involved in the analysis and interpretation of the data are given 
in sufficient detail so that the non-statisticians will be able to utilize 
such a design. 

One of the major projects of the Long Island Vegetable Research 
Farm is the determination of the fertilizer requirements of the various 
vegetable crops including potatoes. Recommendations have heen made 
for fertilizing potatoes, but the advent of extensive irrigation of potatoes 
on Long Island raised the question as to how much difference there 
would be in fertilizer requirements of potatoes under irrigated con- 
ditions compared to potatoes grown without irrigation. Many new 
varieties of potatoes have been recently introduced and preliminary 
information indicated that some of the newer varieties had fertilizer 
requirements which differed considerably from the requirements-of the 
standard varieties in use on Long Island. 

The specific problem for which this experiment must furnish an 
answer could be stated as follows: 

‘What influence does irrigation have on the nitrogen, phosphorus, 
and potash requirements of different varieties of potatoes grown under 
Long Island conditions?” Thus there were five factors to be investi- 
gated, namely irrigation, nitrogen, phosphorus, potash and variety. 


1Paper No. 310, Dept. of Vegetable Crops, Cornell University, Ithaca, New York. 
2Professor of Vegetable Crops, Cornell University, Ithaca, New York. 
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SELECTION OF THE DESIGN 


The area of land available for this experiment consisted of two 
large blocks each 214 feet by 260 feet in size. Irrigation was equally 
available over the whole area with the one restriction that all rows must 
parallel the irrigation lines and thus must run the 260 foot length of 
each block. Irrigation plots require large isolation borders and, since 
there was little interest in this experiment in testing the benefits of 
irrigation, it was decided to divide each block in half, one half to be 
irrigated and the other not. In this way any effect of irrigation would 
not be confounded with blocks and the estimation of fertilizer and 
variety interactions with irrigation could be viewed with more con- 
fidence. These whole plots consisted of 36 rows of potatoes each about 
260 feet long. Four rows were left between plots for irrigation border 
areas. 

Previous work (2)* has indicated the desirability of having at least 
one guard row on each side of a fertilizer plot to eliminate border effect. 
Thus two rows of each plot will have to be discarded and a minimum 
size plot would be three rows wide. To utilize the experimental area 
more efficiently and to provide some estimate of within plot variability, 
plots 4 rows in width were chosen as the smallest practical size. Using 
three levels each of nitrogen, phosphorus, and potash and three varieties, 
there were 81 treatments to be applied to each whole plot. Each whole 
plot had 9 sub-plots in the 36 row width and thus for 81 treatments 
each whole plot would have 9 plots down the length of the potato rows. 
Since each whole plot was over half an acre in size it. was considered 
advisable to select an arrangement of the sub-plots so that soil variability 
within each whole plot could be removed at least partially from the 
estimate of error. Yates (5) has constructed a number of confounded 
arrangements in Latin squares. By confounding some of the inter- 
actions with the rows and columns of the squares, differences among the 
rows and columns can be eliminated from the experimental error. The 
81 treatments can be arranged in a 9 x 9 quasi-Latin square by con- 
founding portions of second and third order interactions. The potato 
rows corresponded in direction with the rows of the Latin square. Each 
plot was 28 feet long. One of the factors involved was variety and it 
was impractical to change variety at planting time for each plot. By 
applying only one variety to each row of the square we have the design 
described by Yates (5) as a half-plaid Latin square. Although this 
feature of the design was desirable from the standpoint of practicability 
in field operation it should be pointed out that considerable precision 


*Numbers in parentheses refer to Literature Cited at end’ of paper. 
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was lost in the varietal comparisons. The comparison of varieties was 
not as important as some of the interactions between varieties and 
fertilizer factors so the loss in this comparison was felt to be justified by 
the increased precision available for the interactions and the saving in 
time and labor in installation of the experiment. 

Figure 1 shows the construction form of the basic square before 
randomization (c. f. Yates (5)). This basic square was used in preparing 
the field layout for all four whole plots. Each plot was obtained from 
this basic square by a separate randomization of the rows and columns. 


FIGURE 1. THE 9 X 9 HALF-PLAID LATIN SQUARE BEFORE RANDOMIZATION 


NPK “W” Confounded 


111 213 312 122 yoy 323 133 232 331 
Sis. 4. 112-4. Sih 321 123 222 332 131 233 
212 311 113 223 322 121 231 333 132 
331 133 232 312 111 213 323 122 221 
233 332 131 211 313 112 222 321 123 
132 231 333 113 212 311 121 223 322 
221 323 122 232 331 133 213 312 111 
123 222 321 131 233 332 112 211 313 
322 121 223 333 132 231 311 113 212 


NPK “X” Confounded 
BRAwSWRQAAQWP, 


Letters represent the three varieties. 
Numbers represent levels of N, P, and K respectively. 


Figure 2 shows the final field plan. The letters A, B, and C represent 
the three varieties which apply to the whole row of the square while 
the groups of three numbers indicate the levels of nitrogen, phosphorous, 
and potash, respectively. To confound interactions of factors having 
three levels each Yates (5) has prepared a formal subdivision of the 
degrees of freedom. The portion of the N P K interaction designated 
as ‘“W” has been confounded with columns of the square and the ‘“X”’ 
component has been confounded with rows. Thus those components of 
the NP K V and NP KI and NP K VT have also been confounded 
with rows or columns of the squares because of the split-plot half-plaid 
features. The actual planting was done with a special planter to be 
described in a later publication. The yields from each of the two middle 
rows for each of the 24 feet long plots were recorded separately. This 
gave information concerning the within plot variability or sampling 
error. This procedure would not always be necessary but sometimes 
knowledge of the sampling error is desirable for use in future work in 


planning experimental designs. 
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TABLE 2. YIELDS OF U.S. NO. 1 TUBERS TABULATED BY TREATMENTS 
Irrigated 
{ B GC 
NPK | 

1 2 Total 1 2 Total 1 2 Total 

RPT eas 68 111 65 58 123 56 46 102 
112 74 63 137 62 82 144 50 62 112 
113 64 71 135 | 59 39 98 56 44 100 
121 74 88 | 162 52 64 116 70 65 135 
122 76 53 129 58 44 102 66 39 105 
123 53 70 | 123 65 34 99 34 50 84 
131 64 58 122 64. 37 101 68 56 124 
132 69 85 154 32 61 93 67 69 136 
133 43 86 129 60 40 100 62 76 138 
211 73 89 162 74 68 142 52 44 96 
FAW 81 68 149 77 48 125 64 89 153 
213 i 103 175 69 73 142 61 69 130 
221 80 60 140 33 93 126 72 69 141 
222 57 96 153 83 80 163 64 60 124 
223 95 65 160 7 35 105 57 70 127 
231 98 66 164 76 79 155 70 86 156 
232 81 67 148 87 78 165 67 64 131 
233 54 94 148 83 70 153 62 60 122 
311 62 86 148 69 84 153 bo 90 145 
312 92 88 180 (GE 84 161 71 72 143 
313 71 72 143 46 80 126 94 76 170 
321 15 70 145 Fee 74 151 76 41 the 
322 60 109 169 73 79 152 58 80 138 
323 94 88 182 77 96 13 78 62 140 
331 86 80 166 67 102 169 57 7 128 
332 79 90 169 65 92 157 76 60 136 
333 98 92 190 70 80 150 78 76 154 
1968 2125 4093 1790 1854 | 3644 1741 1746 | 3487 


11,224 


ANALYSIS OF DATA 


Table 1 is a tabulation of the yield in pounds of U.S. No. 1 tubers 
from each of the two plot rows, arranged to correspond to the planting 
plan of Figure 2. This arrangement of the data is needed for the calcu- 
lation of the total sum of squares and the sums of squares for rows and 
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TABLE 2—Concluded 
Not Irrigated 
A B 6; 
NPK = 
1 2 Total 1 2 Total 1 2, Total 

111 64 56 120 56 24 80 56 48 104 
112 55 46 101 56 42 98 46 al 117 
113 68 46 114 54 66 120 67 49 116 
121 64 72 136 41 60 101 68 45 113 
122 61 56 117 66 70 136 53 41 94 
123 58 38 96 64 52 116 59 65 124 
131 55 502 107 74 46 120 52 64 116 
132 56 50 106 56 61 117 67 33 100 
133 55 46 101 59 40 99 68 50 118 
211 68 62 130 60 52 ile, 60 62 122 
212 56 42 98 intl 62 113 64 63 127, 
213 70 42 112 57 68 125 56 64 120 
221 56 67 123 54 72 126 70 44 114 
222 61 51 112 58 72 130 57 70 127 
223 60 62 122 50 52 102 68 60 128 
231 56 56 112 50 68 118 64 62 126 
232 68 56 124 62 76 138 59 66 125 
233 “ih 52 123 72 53 125 63 78 141 
311 70 64 134 56 61 117 70 48 118 
312 "2 58 130 62 61 123 70 58 128 
313 58 67 125 55 56 111 51 54 105 
321 64 55 119 78 57 135 74 68 142 
322 60 60 120 46 59 105 59 67 126 
323 64. 82 146 62 69 131 54 58 112 
331 62 62 124 49 69 118 57 58 115 
332 64. 50 114 59 52 111 64 70 134 
333 64 62 126 58 67 125 54 64 118 
1680 1512 | 3192 1565 1587 3152 1650 1580 3230 

9574 


Block totals—10,394 and 10,404 


columns in the analysis of variance, Table 4. The data in Table 2 are 
tabulated as in Table 1 by adding the two single row figures for each 
plot and identifying the plot treatments from Figure 2, 
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TABLE 3. TREATMENT TOTALS TABULATED FOR COMPLETE FACTORIAL 
ANALYSIS 


Irrigated Non-Irrigated Total 
NPK - Total 
A B Cc Total A B 6 Total A B (6; 
111 111 123 102 336 120 80 104 304 31 203 206 640 


2 
112 137 144 112 393 101 98 Liz 316 238 242 229 709 
‘ 2 
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TABLE 3. Concluded 
Irrigated Non-Irrigated Total 
NPK - Total 
4 B C Total A B (6 Total A B Cc 

321 145 151 ih hs 413 119 135 142 396 264 286 259 809 
322 169 152 138 459 120 105 126 351 289 257 264 810 
323 182 173 140 495 146 131 112 389 328 304 252 884 
Total 496 476 395 | 1367 385 371 380 | 1136 881 847 775 2503 
331 166 169 128 463 124 118 115 357 290 287 243 820 
332 169 157 136 462 114 111 134 359 283 268 270 821 
333 190 150 154 494 126 125 118 369 316 275 272 863 
Total 525 476 418 | 1419 364 354 367 | 1085 889 830 785 2504 
3T1 459 473 390 | 1322 377 370 375 | 1122 836 843 765 2444 
3T2 518 470 417 | 1405 364 339 388 | 1091 882 809 805 2496 
3T3 515 449 464 | 1428 397 367 335 | 1099 912 816 799 2527 
Total | 1492 | 1392 | 1271 | 4155 | 1138 | 1076 | 1098 | 3312 | 2630 | 2468 | 2369 7467 
AWG 421 418 343 | 1182 384 309 344 | 1037 805 727 687 2219 
T12 466 430 408 | 1304 329 334 372 | 1035 795 764 780 2339 
T13 453 366 400 | 1219 351 356 341 | 1048 804 722 741 2267 
Total | 1340 | 1214 | 1151 | 3705 | 1064 999 | 1057 | 3120 | 2404 | 2213 | 2208 6825 
4 wal 447 393 393 | 1233 378 362 369 | 1109 825 755 762 2342 
1722 451 417 867 | 1235 349 371 347 | 1067 800 788 714 2302 
es 465 377 sol | 1193 364 349 364 | 1077 829 726 715 2270 
Total | 1363 | 1187 | 1111 | 3661 | 1091 | 1082 | 1080 | 3253 | 2454 | 2269 | 2191 6914 
T31 452 425 408 | 1285 343 356 357 | 1056 795 781 765 2341 
T32 471 415 403 | 1289 344 366 359 | 1069 815 781 762 2358 
133 467 403 414 | 1284 350 349 377 | 1076 817 752 791 2360 
Total | 1390-}.1243 | 1225 | 3858 | 1037 | 1071 | 1093 | 3201 | 2427 | 2314 | 2318 7059 
TT1 | 1320 | 1236 | 1144 | 3700 | 1105 | 1027 | 1070 | 3202 | 2425 | 2263 | 2214 6902 
TT2 | 1388 | 1262 | 1178 | 3828 | 1022 | 1071 | 1078 | 3171 | 2410 | 2333 | 2256 6999 
TT3 | 1385 | 1146 | 1165 | 3696 | 1065 | 1054 | 1082 | 3201 | 2450 | 2200 | 2247 6897 
4093 | 3644 | 3487 |11224 | 3192 | 3152 | 3230 | 9574 | 7285 | 6796 | 6717 20798 


The plot totals for the two replicates are then added and this total is 
transferred to a tabulation sheet set up as in Table 3. All of the sub- 
totals necessary for the complete analysis of the factorial components 
are then obtained, by proper addition. . 

The partition of the degrees of freedom as given in Table 4 follows a 
combination of split-plot and half-plaid Latin square breakdowns. 
The whole plots are the four irrigation plots and the three degrees of 
freedom are divided one for irrigation, one for blocks and one for error 


2) 
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TABLE 4. ANALYSES OF VARIANCE 
Degrees of Sum of 

Source of Variance freedom squares Variance 
Total 638 35,700.07 
Sub Total 323 31,002.07 |. 
Trigation (1) me 4,201.40 4,201.40 
Blocks 1 16 
Error (a) 1 301.47 301.47 
Rows 32 4,731.82 

Variety (V) 2 876.53 438 26 

Vi E 2 983 .38 491.69* 

Error (b) 4 275.94 68.98 
Columns 3 7,104.26.) |, ; 
Nitrogen (NV) 3,980.51 | 1,990.26*** 
Phosphorus (P) 129.17 __ 64,58 
NP +—— 112.28 28.07 
Potash (K) 30.62 15.31 
NK 103.00 25.75 
PK 109789 27.47 © 
NI 884.60 442 .30*** 
Pub 152.02 76.01 
WePpl, 15.70 3.92 
Kal 79.51 — 39.76 
NKI 
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(a) (irrigation X blocks interaction). The degrees of freedom for the 
sub-plots are then partitioned as for four Latin squares. The four 
sets of 8 degrees of freedom for rows from each square are combined to 
give 32 degrees of freedom total for rows. Because of the half-plaid 
characteristic some of the row effects are confounded with variety 
effects. The 32 degrees of freedom for rows are divided into 2 for 
variety, 2 for V J, 4 for error (b) and the remainder. Error (b) is made 
up of 2 degrees of freedom for V X blocks and 2 for V I X blocks. There 
are also 32 degrees of freedom for columns, obtained in the same manner 
as for rows. The rest of the degrees of freedom are obtained in the usual 
manner of obtaining all the main effects and interactions. The only 
exception is NPK, NPKV, NPKI and NPKVI. These are 
partially confounded with rows and columns and will not be calculated. 
Yates (5) gives a method for calculating the unconfounded portions of 
these interactions but the effort is rarely rewarded in cases of inter- 
actions of high order. Thus the confounded degrees of freedom of 
these N P K interactions are included in rows and columns and the rest 
are lumped together with error (c). One other characteristic of this 
particular experiment was that the records from one row of potatoes 
were lost. In Table 1 these have been listed as duplicates of the other 
row of the plot in the next to the bottom row of plots of Block 1. These 
9 degrees of freedom are thus lost from the sampling error and this 
accounts for 638 instead of 647 for total and 315 instead of 324 for 
sampling error. 
The computations are done as follows: 


1. The total sum of squares is obtained from the half-plot observations 
in Table 1 as follows: 


648 , 
Di. (267 -- 28". s SB (20798)" 
: 648 


2. The sub-total sum of squares is obtained from the plot totals in 
Table 2. 


5, 48" + 68" + ++ 64") _ (20798) 
: 2 648 


3. The sampling error is the difference between the sub-total and the 
total sums of squares. 


4. The irrigation sum of squares is obtained from the totals in Table 3. | 


(11224)* + (9574)? _ (20798)? 
324 648 
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5. The block sum of squares is obtained from block totals in Table 2. 
6. The four totals needed for the Block X Irrigation Interaction, 


which is error (a), are found in Table 1. 


(5499)* + (4895)? + (4679)? + (5725)? _ (20798)* _ SS. for 
162 648 blocks and 
irrigation 


~ 


. The row sum of squares is obtained from the row totals of Table 1. 


S.8S. for blocks, 
— irrigation and 
error (a). 


5» (600)" + (576)" + +++ (569)" _ (20798)" 
. 18 648 


8. The column sum of squares is obtained from the column totals of 
Table 1 in the same manner. 
9. Calculate the sum of squares for varieties using totals from Table 3. 


10. The VJ interaction is obtained by using the totals in Table 3. 


11. Error (b) is calculated as follows, using totals from Table 2. 


d : ; i : pa Lesa lor 
3 (1968)" + 212) EEC LL CULE SEO 
; error (a). 


12. The sums of squares for N, P, K and the interactions given in 
Table 4 are calculated according to standard factorial procedures 
(5) using appropriate totals from Table 3. For example: 


_ (6201)? + (7130)? + (7467)? - (20798)? 
SS. for VN = 216 ar Coes 
i *. (2032)? + (2088)? + --- (2504)? je (20798)” 
Sip. 106 Nip = 0, 79 ~ 519 


— SS. for N and P 


18 2 2 Ae 2 20798 2 
Oc for ViP Tes ye (1062)* + (970) ra (1085) es (20798)" 
T 36 648 


—§S. for VN, P, NP, I, NI, PI 


13. Experimental error (c) is obtained by subtraction of the components 
already computed from the sub-total sums of squares. 


Further details of computation may be found in Yates (5), Snedecor 
(4) under general description of analysis in factorial experiments, and 
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Cochran and Cox (1) Chapter 8. Significance tests and levels of sig- 
nificance can be found in Snedecor (4). 

The analysis may be carried further to single degree of freedom 
comparisons as described by Yates (5) and Snedecor (4). This step is 
used in interpretation but will not be described here. 


INTERPRETATION OF RESULTS 


The significance of the various main effects and interactions must be 
determined by comparison of the variances with the proper error 
variance. Irrigation can be tested only with error (a) and no signifi- 
cance is indicated. Since the design selected had a low precision for 
testing the effect of irrigation, small differences could not be detected 
in this experiment. The large variance due to irrigation might indicate 
that some advantage could be gained by analyzing the irrigated and 
non-irrigated sections separately. Since the separate analyses did not 
in this case change the interpretations they are omitted for simplicity. 

Even with the few degrees of freedom available the variety irrigation 
interaction was significant. The interpretation of this interaction is 
facilitated by plotting the means as in Figure 3. From this it is seen 
that without irrigation all varieties yielded about the same but with 
irrigation Variety A was much higher than B or C and there was no 
difference between B and C. Thus the interaction was the marked 
response of Variety A to irrigation compared to the lesser response of 
B and C. 

There was a significant response to nitrogen. Examination of the 
means (Table 5) reveals that there was a significant increase with each 


TABLE 5. RESPONSE TO NITROGEN APPLICATIONS 


Mean yield in lbs. 
Level of Nitrogen per plot row 
Ni 28.71 
Nz 33.01 
Ns 34.57 


increment of N even though the response to the second increment was 
much less than the response to the first increment. A more convenient 
method of analyzing the N effect is to break out the two individual 
degrees of freedom. This is logically done by the linear and quadratic 
terms as described by Yates (5). This divides the 3980.51 sum of 
squares for N into 3710.08 for linear and 270.41 for quadratic. Both 
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FIGURE 3. INFLUENCE OF IRRIGATION ON YIELD RELATIONSHIPS 
OF THE VARIETIES 


terms are significant when compared by F test to the error (c). This 
means that although there was a marked linear response to N the second 
increment was less effective than the first. 

Further examination of the analysis of variance indicates a signifi- 
cant N I interaction. Comparison* of this variance with the variance 


*This comparison is valid on the assumption that the levels of N and I are samples of an infinite 
population of levels. If these are considered as fixed effects such a comparison is not valid. 
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Irrigated 


8 


R 


Mean plot yields in pounds 
¢ 


Not irrigated 


8 


I TRE RET er 


3 
Levels of ¥, 
FIGURE 4. INFLUENCE OF IRRIGATION ON RESPONSE TO NITROGEN 


for N shows that N variance is not significantly larger than the N J 
interaction variance. Thus the effect of N must be interpreted only in 
terms of the irrigation level. The means for the N I interaction are 
plotted in Figure 4. Here it is seen that the response to V was much 
greater with irrigation than without. There seems to be a similar 
tendency toward less response from the second increment at both irri- 
gation levels. If the interaction is broken down to one degree of freedom 
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for linear N by irrigation and one for quadratie N by irrigation it will 
be found that the variance for linear N X Irrigation is 878.35 and for 
quadratic N X J is 6.25. Obviously the latter is non-significant indi- 
cating the same reduction in response to the second increment of N 
with and without irrigation. The linear responses indicate a much 
greater response to NV with irrigation than without. Even though the 
reductions in response from the second increment of N were the same 
with and without irrigation it is evident that with irrigation the limit 
of economic N application will be much greater than without because of 
the greater response in each increment. 

The significance of the VN P I V interaction poses some new problems. 
First the N J variance and the V J variance should be compared with 
N PIV variance to determine whether N J and VI can still be in- 
terpreted as above in spite of the influence of P and V in them. NJ 
variance is significantly larger than N PIV variance so the above 
interpretation can stand. V TJ variance is not different from N PIV 
so the V J interpretation must be altered. The N PJ V means are 
plotted in Figure 5 for ease of interpretation. Without irrigation there 
was little response to N. However, with irrigation there was response to 
N and the degree of response varied with variety and phosphorus level. 
Variety A responded best to NV at P; , Variety B responded best at P, or 
P, , the response at P, being linear, but at P3 a reduced effect was found 
for the second increment of N. Variety C responded best to N at P, . 
Thus the response of the varieties to N with irrigation must be inter- 
preted with respect to specific varieties at definite levels of P. 

One other point of interest is the comparison of the sampling error 
and the experimental error (c). Using the notation found in Snedecor 
(4) it is found that Sz = 11.21 and Ss; = 14.91. The estimate of the 
variance of a treatment mean is: 

5 SE Re ee 

Dee Cal Rive ee Rk } 
where R is the number of plots of the treatment and k is the number of 
rows per plot. It is evident that increasing replications or size of plot 
will both reduce the treatment mean variance. However, it would seem 
that increasing replications will be more advantageous in this case since 
R is present in both terms of the equation and the two terms are the 
same size. If S2 was markedly greater than S; then there also would 
be some advantage in increasing the number of rows per plot. Whether 
additional precision is desired depends on the objectives of the experi-_ 
ment. For this experiment, destined to be continued for several years, 
- the degree of precision seems adequate. 
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DISCUSSION AND SUMMARY 


That this design has succeeded in removing a large portion of field 
variability was confirmed by the very large components of variance 
associated with columns and with rows exclusive of varieties. The 
gain in precision may be calculated. Assuming that the irrigation plots 
would have to be split for varieties and the fertilizer treatments then 
randomized on each variety plot, and assuming that the N P K inter- 
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actions are not significant, the error variance for such a design would 
be the rest of the rows plus columns plus error (c). This gives a new 
error variance of approximately 75. Dividing 37 by 75 we obtain 49 
per cent as the efficiency of an unconfounded arrangement. The re- 
ciprocal of this is 204 per cent and the gain in information due to con- 
founding is 104 per cent. 

This paper is a description of the split-plot half-plaid Latin square 
in a field irrigation experiment with fertilizer and potato varieties. 
Some of the major considerations which led to the selection of the design 
are discussed. A description is given of the method of analysis of the 
data and some of the interactions are discussed. The use of the half- 
plaid Latin square feature has resulted in a gain of 100 per cent in 
information compared to the split-plot design. For a complete interpre- 
tation of three years’ data from this experiment the reader is referred 
to (3). . 
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FITTING THE NEGATIVE BINOMIAL DISTRIBUTION 
TO BIOLOGICAL DATA 


Ole Brass 


The Connecticut Agricultural Experiment Station and 
Yale University 


NOTE ON THE 
EFFICIENT FITTING OF THE NEGATIVE BINOMIAL 


R. A. FIsHER 


Department of Genetics 
Cambridge 


In studying the occurrence of plants and animals in nature, the 
number of individuals may be counted in each of many equal units of 
space or time. The original counts can be summarized in a frequency 
distribution, showing the number of units containing « = 0, 1, 2, 
3, --- individuals of a given species. If every unit in the series were 
exposed equally to the chance of containing the organism, the dis- 
tribution would follow the Poisson series, each unit having the popula- 
tion mean as its expected frequency. It is easy to test whether the 
variation in the number of individuals per unit agrees with this hypo- 
thesis. Since the expected variance of a Poisson distribution is equal 
to its mean, the observed variance s’, multiplied by the degrees of 
freedom n, may be divided by the sample mean é to obtain x” = ns’/2. 
More often than not x” is significantly larger than its expectation, 
not only in distributions of plants and animals in nature but even in 
the laboratory. 

A number of distributions have been devised for series in which the 
variance is significantly larger than the mean (2, 11, 21), frequently on 
the basis of more or less complex biological models. In the present 
paper this characteristic will be called ‘over dispersion’. Perhaps the 
first of these was the negative binomial, which arose in deriving the 
Poisson series from the point binomial (27, 32) although it had been 
formulated in 1714 (2). Comparisons of expected and observed dis- 
tributions have shown its wide applicability to biological data. The 
relative ease with which the negative binomial can be computed and 
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other desirable properties that have been described by Fisher (12) and 
by Anscombe (2) have now been supplemented by a practicable maxi- 
mum likelihood estimation of its parameters described in the note 
by Sir Ronald Fisher at the end of the present paper. Here we will 
consider some characteristics of the negative binomial, estimates of its 
parameters and their precision, the calculation of the expected fre- 
quencies for a given sample, and its applicability to biological data. 

The Negative Binomial Distribution. The negative binomial distri- 
bution is completely defined by two parameters, the arithmetic mean m 
and a positive exponent k. It is so called by analogy with the positive 
binomial distribution, (¢ + p)", where n’ is the number of individuals 
in a group and gq and p are the expected proportions in two contrasting 
categories with g + p = 1. In computing the statistics of the positive 
binomial from distributions of yeast cells counted with a haemocy- 
tometer, Student (27) observed that two of his series gave negative 
values for p and n’ but nevertheless fitted his observations very well. 
These and other cases (32) were described by the negative binomial, 
(q — p) *, where p = m/k and gq = 1+ p. By expansion of this ex- 
pression, the probability P, that an observational unit x will contain 
0, 1, 2, --- individuals is 


_@&+2- IR 
ST IE w) 


where R = p/qg = m/(k + m). The probability for a given zx is multi- 
plied by N, the total number of units counted, to obtain the expected 
frequency (¢) of units with x individuals. The curve defined by the 
P.’s (or ¢’s) is unimodal, so that in fitting the negative binomial to an 
observed distribution any apparent bimodality (or multimodality) is 
attributed to random sampling. 

The negative binomial is an extension of the Poisson series in which 
the population mean m, the parameter of the Poisson distribution, is 
not constant but varies continuously in a distribution proportional to 
that of x’. As the variance of a negative binomial approaches the mean, 
or the over-dispersion decreases, k © and p— 0. Under these con- 
ditions it can be shown (13) that the distribution converges to that for 
the Poisson, P, = e™ (m7/x!). Conversely, if the over-dispersion in-——~ 
creases sufficiently, k — 0. If we disregard the number of units con- 
taining no individuals, the negative binomial then converges to Fisher’s 
logarithmic series (13), which describes effectively the apparent abund- 
ance of different species. Thus the limiting values of the exponent lead 
to distributions of importance in biology, ' 
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An example of the negative binomial is provided by counts of the 
number of European red mites on apple leaves, for which I am indebted 
to Dr. Garman of The Connecticut Agricultural Experiment Station 
(15). On July 18, 1951, 25 leaves were selected at random from each of 
six McIntosh trees in a single orchard receiving the same spray treat- 
ment, and the number of adult females counted on each leaf. The 
frequency distribution of mites on the 150 leaves is given in the first 
two columns of Table 1. 


TABLE 1 


Fitting the negative binomial to counts of red mites on apple leaves, 
data of P. Garman (15). 


No. of mites No. of leaves | Accumulated Expected (f — ¢)? 
per leaf observed frequencies frequencies 

z f A; ¢ ¢ 

0 70 80 69.49 .004 
1 38 42 37.60 .004 
2 17 25 20.10 478 
3 10 15 10.70 046 
4 9 6 5.69 1.925 
5 3 3 3.02 

6 2 1 1.60 027 
7 1 .85 

8+ 0 .95 

Total 150 = N 150.00 2.484 = x? 


S(fx) = 172, S(fx?) = 536, S(fx*) = 2170 
& = 1.14667 (Eq. 2), s*? = 2.27365 (Hq. 4) 


As a test of agreement with the Poisson series, the observed 
variance (s”) has been computed from the basic sums beneath the 
table. It was nearly twice as large as the mean. From x” = 149 x 
2.27365/1.14667 = 295.44 with 149 degrees of freedom, P is less than 
001. The over-dispersion was clearly far too large for the Poisson 
series. By contrast, the frequencies expected by the negative binomial, 
shown in the fourth column of Table 1, reproduced the observed values 
very closely. In the next sections, the detailed computation of the 
negative binomial will be illustrated with this example. 

The Statistics of the Negative Binomial. The parameters m and k of 
the negative binomial are estimated from the frequency distribution of 


i  — 
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a sample by the statistics @ and k. The mean is estimated efficiently 
from the frequency f of units at each x as 


= = S(fx)/N (2) 


For the distribution in Table 1 its numerical value was = 1.146667. 

The exponent & is more difficult. Two approximations (methods 1 
and 2) are available, each with a relatively high efficiency for suitable 
combinations of m and k (1, 2). With the maximum likelihood solution 
(method 3) in the note appended to this paper, either estimate may be 
used as a first step toward a fully efficient fitting. 

(1) The original and simplest solution for the statistic k is that 
based upon the first and second moments (12, 32). It is determined 
from the mean and variance s’ of the sample as 


Pe 3 
1 2 Zz ( ) 


2 _ S(fz’) — S*(fx)/N 
rates N-1 od 


The efficiency of the moment solution as defined by Fisher (12) has 
been plotted by Anscombe (1, 2) for different combinations of m and k. 
It has an efficiency of 90 percent or more for small values of m when 
k/m > 6, for large values of m when k > 13, and for m in the inter- 
mediate zone when (k + m) (k + 2)/m > 15. In practice the popula- 
tion values m and k are replaced necessarily by the statistics ¢ and k. 

In our example, the estimation of i, leads by Eq. 3 to ky = 
(1.14667)’/(2.27365 — 1.14667) = 1.16670. If k, has been estimated 
with an efficiency of 90 percent or better, the inequality 2.313 xX 
3.167/1.147 > 15 should hold, but since 6.39 } 15, this method is not 
suitable for the present series. 

(2) An alternative estimate (1, 2) is based upon the ratio of the total 
number of units in the sample (N) to the number of units without 
organisms (f)). From Eq. 1 the expected probability for x = 0 is 
P, = 1/¢*, and if P, is replaced by the proportion observed in the zeTO 
class, we have fo/N = 1/gq". Since g = 1 + m/k, an iterative solution 


is necessary. The required estimate of kk is that which balances the —— 


equation : 
kt, log (1 + &/kz) = log (N/fo) (5) 


The left side of Eq. 5 is computed twice, with different trial values 
k’, one giving a larger and the other a smaller product than the con- 
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stant term on the right of the equality. Interpolating between these 
two products for log(N/fo) leads to a first approximation of ky , with 
which the process can be continued until k> has the required precision. 
To estimate & with an efficiency of 90 percent or more, at least 1/3 of 
the units must be empty, but if the mean is less than 10, enough more 
empty units are required to satisfy the inequality (m + 0.17) (Po — 0.32) 
> 0.20. The terms in the inequality are replaced necessarily with 
m = €and P, = f,/N from the sample. 

Since the number of zero frequencies in Table 1 is relatively large, 
method 2 is the more promising. As a test of its expected efficiency, 
we find (x + 0.17) (fo/N — 0.32) = .193, which is very close to the 
level for 90 percent efficiency. Fr om Eq. 5 the required ke is the trial 
value k’ for which k’ log(1 + #/k’) = log(150/70) = .330993. The 
first trial value of k’ = 1 was based on the estimate from method 1 and 
the computation arranged as follows: 


k! 1+ &/k’ k! log (1 + &/k’) 
1 2.14667 331767 
98 2.17007 329744 
992 2.15592 330962 


With ki = 1, the left side of Eq. 5 was too large, so that the calculation 
was repeated with kj = .98, reversing the inequality. By interpolation 
ky = .98 + .02 (.3830993 — .329744)/(.331765 — .329744) = .992, 
which slightly underestimated the required value. By interpolation 
between kj and kj , k. = .99231. 

(3) For many distributions, k cannot be determined by either of 
the above techniques with an acceptable efficiency. In these and all 
critical cases, the k computed by Eq. 3 or 5 may be considered as the 
first step toward a definitive solution. This is provided by the method 
of maximum likelihood (17, 24). By suitable arrangement of the calcu- 
lation, as developed by Sir Ronald Fisher in the appendix to the present 
paper, the procedure is practicable and rapid when the largest observa- 
tion does not exceed 20 or 30. Scores (z;) are computed from trial values 
of ki , selected so that they bracket the required estimate k, for which 
2, = Oin the equation 


eS s742-) - vin(i +4) © 


where In designates a natural logarithm. As a first step in the compu- 
tation, the accumulated frequency A, in all units containing more than 


tate 
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« organisms is written opposite each x. The reciprocals 1/(ki + 2) 
from Barlow’s Tables (to seven places or more) are multiplied in turn 
by A, and the products accumulated to obtain the summation in the 
first term. The second term may be determined as the seven-place 
common logarithm of (1 + ¢/k;) multiplied by 2.3025851 « N. 

The first score z, (¢ = 1) is computed with the first trial value lige 
usually based upon the & from Kq. 3. The second trial value k depends 
upon the sign of z,. If 2, is positive, kj > kf ; if negativ e, ki < ki, the 
two differing just enough to give opposite signs to z, and z,. The 
third trial value ki , between k{ and kj , is obtained by linear inter- 
polation for z = 0, but the computed z; is rarely exactly zero. For 
precision, it is preferable to compute a 2, with a sign opposite to that 
of z; by selecting ki at about the same distance as ki beyond a newly 
interpolated k’ for z = 0. This provides a narrower interval within 
which the final: may be interpolated and a better estimate of its 
variance obtained. 

Since the better of the two approximations of & in our example was 
barely 90 percent efficient, the likelihood solution would be preferred. 
The cumulative frequencies exceeding each x were first listed (Table 1), 
so that for x = 0, for example, A, = 150 — 70 = 80, and for z = 1, 
A, = 80 — 38 = 42. Starting with a trial value of ki = 1.0, the re- 
ciprocal, 1/(k{ + x), for each x was multiplied by its corresponding A, 
and the products accumulated in the calculator to obtain the first term 
in the score z (Eq. 6). This sum has been listed separately and below 
it the second term in the score, 345.3878 & log(1 + 1.146667/k’): 


m=10 k&=1.05 ky = 1.026 k, = 1.023 

S{A./(k’ + 2)} 114.9262 110.4045 112.5247 112.7961 
—NIn(i+ #/k’) —114.5875 —110.7227 —112.5432 —112.7752 
2; 3387 — 3182 = 0155 0209 


Since z, was positive, the second trial k’ was increased to ki = 1 05, 
leading to a negative z. . Interpolating between them for z = 0, 
kf = 1.0 + (.3387 X .05)/(.3387 + .3182) = 1.026, which, in turn, gave 
z; = —.0185. From linear interpolation between z, and z; for z = 0, 
i = 1.0247. To insure a positive score near zero, a new trial value was 
selected of ki = 1.023. Interpolation between z, and z, gave the maxi- 
mum likelihood estimate of & = 1.02459. 

The Variances of = and k. The sampling variances of the statistics 
of the negative binomial depend upon the parameters m and k, but in 
practice these are replaced necessarily by the corresponding statistics 


am 
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and k. The mean is computed efficiently in all cases and its variance 
is 


v@) = (m+ ™)in (7 


The error variance of the mean in the example from Table 1, computed 
with the maximum likelihood estimate of k, was V(@) = (1.14667 + 
1.28330) /150 = 01620, giving the standard error s; = .1273. 
The variance of k i. depends upon how it has been estimated. In 
general, the variance of & is less when k is small than when it is large. 
(1) If computed from s” by Eq. 3, its large-sample variance (2) is 


2h(k + 1) 
NR? 


solved with R = a@/(k + @). For k, = 1.16670 from Eq. 3, R = 
1.14667/2.31337 = .49567, and V(k,) = 5.0558/36.853 = .1372, from 
which this estimate of k has a standard error V .1372 = .370. 

(2) If & is determined from the number of zero units by Eq. 5, its 
large-sample variance (2) is 


Viki) = (8) 


R)*—1-—kR 


= a 


where Ff is defined as above and In is a natural logarithm. The error 
variance in the example for ke = .99231 as estimated by Eq. 5 is solved 
with R = 1.14667/2.13898 = .53608, to obtain V(k.) = .61089/8.0709 = 
.07569. The standard error of k2 is 0.2751, or about 3/4 as large as 
that for hy : 

(3) The variance of the maximum likelihood estimate of k& is the 
reciprocal of the amount of information about k, or the rate at which 
the score z is decreasing as it passes the zero. It is computed, there- 
fore, from the two values of z; just above and below zero, say 23 and 2, , 
and the two trial values of k{ with which they have been computed, ki 
and kj , as 


ks — ks 
24 — 23 


Vii) = (10) 


Hence the error variance of k = 1.02459 is 


-V(&) = (1.026 — 1.023)/(.0209 + .0185) = .07614, 
so that the maximum likelihood & had a standard error of Se =. 2759, 
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Tests for Agreement with the Negative Binomial. Perhaps the most 
convincing test of the adequacy of the negative binomial in any given 
case is the agreement between the frequencies (f) observed at each x and 
their expected values (¢) as computed from the statistics of the sample. 
The discrepancy between them is easily tested by x’. 

The expected frequencies are computed with Eq. 1, most readily in 
succession and starting with the number expected at x = 0. This first 
expectation is determined with the aid of 7-place logarithms as 


go = N/q (11) 
and the succeeding entries for z = 1, 2, 3, --- as 
k+2—1R 
g. = ETL, (12) 


More decimal places should be retained in the calculator at each stage 
than need be recorded, so as to avoid accumulating rounding errors. 
The observed and expected frequencies are then compare by x’, where 


ever “9 


x’ has three fewer degrees of freedom than the number of ratios that 
are summed. As usual, the frequencies with small expectations are 
pooled, preferably so that no expectation is less than 5. If x’ shows 
good agreement between the matched frequencies, no other test may 
be needed, especially if both < and i: are efficient estimates. 

In the example, the expected frequency for x = 0 was determined 
with Eq. 11 as the antilog of log (150) — 1.02459 log (2.11915) or 
¢) = 69.4879, giving the initial entry in the fourth column of Table 1. 
From p = 1.11915 andqg = 1+ p, R = 1.11915/2.11915 = .528113 
and by Eq. 12, ¢, = 1.02459 X .528113 X 69.4879 = 37.5999, ¢. = 
2.02459 X .528113 X 37.5999/2 = 20.1011 and so on for successive 
values of xz. To avoid rounding errors, six significant figures were 
carried in the calculation, although the ¢’s were recorded only to two 
decimal places. The final value, for = 8+, was obtained as the 
difference between 150 and the sum of preceding ¢’s. The calculation 
of x’ for the discrepancy between the observed and expected frequencies: 
(Eq. 13) is shown in the last column of Table 1, pooling the frequencies 
for z > 5 so as to avoid expectations of less than ¢ = 5. The resulting 
x° = 2.484 with three degrees of freedom and P = 0.48 indicates good 
agreement with the negative binomial. 

The comparison of observed and expected frequencies by x’ may be 
distorted by chance irregularities in the individual entries. Thus in 


s 
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testing agreement with Neyman’s contagious distribution, Beall (6) 
“smoothed” some of his more uneven observed frequencies before 
computing x’. This difficulty can be avoided by testing the agreement 
of the observed with the expected second and third moments of the 
negative binomial. The relation of these tests to alternative formula- 
tions, such as the logarithmic, the discrete log-normal, and the “con- 
tagious” distributions, has been considered by Anscombe (2). The 
moment tests have the further advantage that they take account of the 
few large values which are missed by grouping the tail of an observed dis- 
tribution in computing x’. 

Two tests have been described by Anscombe (2), each having the 
form. of a difference between an observed and an ‘expected’? moment. 
Although m in each test is estimated efficiently by ¢, its variance has 
been derived on the assumption that an efficient estimate of k is not 
available for computing the expectation. A variance so derived may not 
apply when the expected moment is estimated by maximum likelihood. 
Hence the test criteria 7 and U are defined in terms of the estimates for 
which the variances are known. These variances, however, should 
always be computed with the best available estimate of k, usually that 
derived by maximum likelihood (k). 

The difference 7 between the third moment of the sample and its 
value predicted from the first two moments of the same sample of a 


negative binomial is 
3 5 9 2 
7m 21 | a) 


where 
[x*] = S{f(e — #)*} = S(fx*) — 38(f2’) + 22’S(fz) (14) 


The significance of the difference T is determined by comparison with 
its standard error, the square root of its large-sample variance 


V(T) = 2m(k + Ip'q[2(3 + 5p) + 3kq]/N (15) 


The variance of 7’ should be computed with estimates of p, g and k 
based upon the maximum likelihood & when it is known. Although the 
expected third moment may be determined more accurately from the 
same maximum likelihood estimates as ¢(q + p)m, the variance in. Eq. 
15 is then of doubtful applicability. 

When the observed second moment is compared with its expectation 
computed with ky , the difference 


Ge 8 (8 o/s) (16) 


NEGATIVE BINOMIAL DISTRIBUTION I&5 
has the large-sample variance 


PT eek = : i ae - 
V(U) = 2m(k + 1)pq (1 ae (een o 5) / N + p'V(ks) (17) 


V (kz) is defined in Eq. 9 but computed with the maximum likelihood 
estimate k if this is known, as are the other terms in Eq. 17. Here 
again the expected second moment, gm, can be estimated more exactly 
with the maximum likelihood value for k but the applicability of the 
variance in Eq. 17 is then in doubt. 

The differences between the observed and expected moments are 
much easier to compute than their standard errors. From [z*]. = 
778.47 (Eq. 14) the present example had an observed third moment 
of 5.1898 and an expected value from the first two moments, ¢ and s” of 
6.7429, so that T = —1.553 by Eq. 18. Since its variance by Eq. 15 
was 4.1272, the standard error of 7’, 2.032, showed no discrepancy from 
a negative binomial. The corresponding difference for the second 
moment and its error has been computed by Eqs. 16 and 17, to obtain 
U = —0.198 + 0.302, again in agreement with the negative binomial. 

Models for the Negatiwe Binomial. When the number of individuals 
per unit of space or time in repeated counts cannot be assumed to have 
the same expected value, they may represent a mixture of several 
homogeneous Poisson distributions. The number in each unit is re- 
stricted to the integers but this is not true of the expectations or means. 
In a mixture of Poissons, the means represent a positive continuous 
variate. The simplest frequency distribution which they might follow 
is the Eulerian distribution or the Pearson type III curve, and if in fact 
they are so distributed the observations will conform to the negative 
binomial. 

The expected frequency may be known to vary within an observed 
distribution. A case in point is the distribution of bacterial clumps 
over a milk film (20). At least two disturbing factors were involved. 


In preparing a film, 0.01 milliliter of milk was placed on a microscopic 


slide and spread with a needle over an area of one square centimeter. 


Bacteria caught by surface tension on the lower surface of the drop | 


adhered to the glass slide on contact and thus increased the concentra- 
tion of bacteria in this area of the film. Secondly, the fresh drop did 


not have the same thickness over the entire square centimeter but due 


to surface tension was thicker in the center than in the margins, so 
that more bacteria were deposited in the center. Despite these two 
factors, milk meeting public health standards had so low a bacterial 


count that the distribution of bacterial clumps per microscopic field — 


was seldom distinguishable from a Poisson. However, when the 
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TABLE 2 


Observed distributions of bacterial clumps per field (Obs. f) in a milk film (20) and of 

yeast cells per square in a haemocytometer (27) and the expected frequencies, com 

puted from the negative binomial (Bin ¢) and from the Neyman type A (Ney ¢) 
distributions (21). 


No. per Bacterial clumps Yeast cells 
unit Obs. Bin. Obs. Bin. Ney. 
© 3 ¢ f ¢ ¢ 
0 56 64.2 213 214.2 214.8 
1 104 90.3 128 122.8 121.3 
2 80 82.7 37 45.0 45.7 
3 62 62.1 18 13.4 13.7 
4 42 41.6 3 3.5 3.6 
5 27 25.8 1 9 8 
6 9 15.1 see spall 
7 9 8.5 
8 5 4.7 
9 3 2.5 
10 2 1.3 
il +1.2 
19 1 
N, P(x?) 400 54 400 i19 .18 


bacterial count was high, the distribution of clumps per field reflected 
this known heterogeneity and then was often a negative binomial, as 
in the example in Table 2. This phenomenon of substantial agreement 
with the Poisson at low population densities and with the negative 
binomial at higher densities has been observed with both plant and 
animal populations (4, 5). A lack of randomness in microscopic counts 
was observed by Student in counting yeast cells with a haemocyto- 
meter (27). In fact, this seems to be the first case to be fitted with a 
negative binomial. The original counts are given in Table 2, together 
with the expected negative binomial frequencies as computed by 
maximum likelihood and those computed by Neyman (21) with his type 
A contagious distribution. 

The distribution of insect pests is so seldom uniform that most 
experiments on insect control are randomized and replicated. In. a 
field experiment of this type on the corn borer, four treatments were 
arranged in 15 randomized blocks (26). At the end of the season, 
eight hills of corn were selected at random in each plot and the borers 
recorded from each hill. This experiment has been reported both in 
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TABLE 3 
Distribution of corn borers (Obs. f) in a field experiment arranged in 15 randomized blocks, where 
treatment 1 is the untreated control or check (6). The expected frequencies for the negative binomial 


(Bin. ¢) have been computed independently for each treatment with the statistics in the last row of the 
table; those expected for the Neyman type A (Ney @) are from Beall (6). 


Borers Treatment 1 Treatment 2 Treatment 3 Treatment 4 
per hill Obs. Bin. Ney. | Obs. Bin. Ney. | Obs. Bin. Ney. | Obs. Bin. Ney. 
x f co) ? f ’ ’ uf ’ co) f ’ g 
0 19 16.6 34.4 24 19.6 22.6 43 44.3 49.8 47° 45:3 53.4 
iz 12° 185-764 16.) 22.2 Gey 35) 3121 23:3 VEY ON kOe 
2 18 16.9 10.4 16 IS? WSIS die 19 1s 18:9) 21, h8.4 1725 
3 18 14.5 11.9 189 15.9) 624 11 V2" 1243 Oi Oot 
4 11 11.9 11.2 Se 2233) M324 5 624°°":7,3 iG 6.4 7.5 
5 12 9.5 9.5 9 9.0 10.3 4 36" 4d 34 Buk (424 
6 7 (Pe ee 6 6.0 * f209 1 220! 252 1 Aad eee Oo 
7 8 5.9 6.4 5 4.6 5.2 2 ikea s ipegtE 1 1i25 41,4 

8 4 25 Be 3 See aie 2 
9 + Ba) A 4 2.3 2.3 +1.2 +1.0 +1.8 +1.5 
10 1 247. “s.2 3 TE i a 1 
Bi 2.0 2.5 1% 9 it 
12 1 1.5. 1.9 1 
13 1 1D: yi oA. +2.1 +1.4 
15 1 
17 1 +3.3 +3.6 
19 1 
26 1 
N, P(x?) 120 .65 .002 | 120 66 .98 120 .88 .09 120 16 .09 
ae = 4.0383 1.532 3.167 1.764 1.483 1.333 1.508 1.190 


terms of the total number of borers per plot (26) and as frequency dis- 
tributions showing for each treatment the number of hills with z = 1, 

- borers (6). From an analysis of variance of the plot totals (in 
logarithmic units) the level of borer infestation varied significantly 
from block to block (P < .01). In consequence, the composite dis- 
tributions from the 15 plots for each treatment (Table 3) represented 
unequal levels of infestation. Negative binomials have been fitted 
separately to each of them. Since the expected frequencies agreed well 
with the observed values, the data are consistent with the hypothesis 
that each represented the sum of several Poisson distributions of 
unequal means. 

The distributions in Table 3 might arise from a Widevent model 
for the negative binomial in which the non-randomness is attributed to 
“contagion”, in this case, a result of the larvae hatching from eggs 
that were laid in masses. Contagion, in fact, was the basis for the 
Neyman type A distribution developed originally for these observations 
(21, 6) as described in the next section. With a different mathematical 
formulation it leads also to a negative binomial. ‘‘Contagion’’ was one 
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TABLE 4 
Distribution of the number of accidents experienced by machinists (f) and their negative binomial 
expectations (p) (16). Distribution of soil bacteria in microscopic counts, showing the colonies per 


field fitted with the Poisson series, the bacteria per colony with a logarithmic series, and the bacteria 
per field with a negative binomial (18). 


Accidents No. of Colonies No. of Bacteria No. of Bacteria No. of 
per ma- machinists per fields per colonies per fields 
chinist wf (0) field Obs. Cale. colony Obs. Cale. field Obs. Cale. 

0 296 296.7 0 11 14.6 1 359 362.1 0 11 13.0 
1 74 71.0 1 37 40.9 2 146 §=186.1 1 17 21.0 
2 26 26.4 2 64 57.2 3 57 68.3 2 31 24.6 
3 8 11.0 3 55 53.4 4 41 38.5 3 24 25.4 
4 4 4.8 4 37 37.4 5 26 2322 4 29 24.2 
5 4 2p2 5 24 20.9 6 17 14.5 5 18 22.0 
6 ib 1.0 Gia= UY 15.6 Wage 2x! 30.3 6 19 19.4 
aa 5 if 16 16.7 
8 if Bo 8 13 14.1 
9 +.2 9 17 LET 

10 6 9.6 

11 8 7.8 

12 31 30.5 

N, P(x?) 414 Ot 240 63 673 56 240 Oe 


of the explanations proposed by Student (28), who wrote, “If the 
presence of one individual in a division increases the chance of other 
individuals falling into that division, a negative binomial will fit best, 
but if it decreases the chance, a positive binomial”. This explanation 
has figured prominently in the study of accident statistics (3, 16). 

An early example is the distribution in Table 4 of accidents ex- 
perienced by 414 machinists in three months, where the observed 
frequencies are matched satisfactorily by those computed with a negative 
binomial. If each machinist had had the same initial probability of 
being involved in an accident but if this probability were increased (or 
decreased) by his having an accident, contagion would be present and 
a negative binomial distribution could result. However, an opposite 
assumption leads to exactly the same expected distribution. If ex- 
periencing an accident had no effect upon the risk of another accident, 
but if the individual machinists or their shops or intervals within the 
three month period differed in their accident-proneness, a negative 
binomial would also result. Hence, the appearance of ‘‘contagion is not 
inherent in nature but simply in our method of sampling” (11). The 
relation of these two models, a mixed or compound Poisson distribution 
without contagion and contagion which changes the odds of further 
events, is discussed ably in the recent monograph by Arbous and Kerrich 
(3). Alternative distributions have been developed from other math- 
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TABLE 5 
Observed distributions of quadrat counts (Obs. f) of Lespedeza capitata and of Liatris aspera in an old 
field association (30) and of Primula auricula in a grassland association (7), and their expectations with 


the negative binomial (Bin. ¢), Neyman contagious type A (Ney @) and Thomas double Poisson 
(Thom @) distributions, 


Plants Lespedeza capitata Liatris aspera Primula 
per quadrat Obs. Bin. Ney* Thom Obs. Bin Thom Obs. Bin 
x fi 2 2 ’ ty ? ? i) Co) 
0 7178 WITS. A OTISS 4. F279i-2 7403 7403.1 7420.4 26 23.6 
1 286 283 .7 219.6 105.2 183 179.8 140.0 21 26.1 
2 93 95.0 140.8 127.9 34 40.0 62.3 23 20.9 
3 40 41.1 61.6 78.6 14 11.5 14.4 14 14.7 
4 24 19.8 21.4 33.1 4 3.7 2.4 11 9.5 
5 rf 10.2 6.2 : Te eb 1 1S) 0.4 4 5.9 
6 5 5.4 I 3.4 if 4 ues 5 3.5 
rf 1 2.9 5 1.0 +.2 4 2.1 
8 2 1.6 a +.5 1.2 
9 i 9 1 Jef 
10 2ey > clipatss 
11 1 a 
12 +.5 
N, P(x?) 7640 .84 <.001 <.001 7640 46 <.001 109 .70 


*Computed by maximum likelihood (30), all other Neyman Type A expectations fitted by moments. 


ematical definitions of ‘‘contagion’”’ and will be considered in the next 
section. 

The negative binomial may result from a different but related 
model. In counts of soil bacteria, Jones and Mollison (18) recorded 
for each microscopic field both the number of bacterial colonies and 
the number of bacteria in each colony. The number of colonies per field 
agreed well with the Poisson expectation for a random distribution and 
the number of bacteria per colony in the same counts with Fisher’s 
logarithmic distribution. Under these conditions, the expected distri- 
bution of bacteria per field is a negative binomial (23), as confirmed 
by the agreement of the expected and observed frequencies (Table 4). 

Quadrat counts in plant ecology which departed from Poisson have 
been attributed to the occurrence of plants in “clumps”. Blackman (7) 
noted that Primula auricula reproduces vegetatively by short rhizomes, 
so that older individuals are often surrounded by younger plants. In 
an old-field community (30) both Liatris astera and Lespedeza capitata 
tended to occur in clumps. The distributions of clumps per quadrat 
and of plants per clump have not been reported separately, so that the 
model cannot be tested directly. However, the distribution of plants 
per quadrat in all three cases agreed excellently with the negative 
binomial (Table 5). 
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TABLE 6 
Observed animal distributions (f) and their negative binomial expectations (p) of Microcalanus nauphi 
in samples of marine plankton (also fitted with a Neyman type A) (5), of Tanytarsus in Ekman hauls 
(19), of Oligochaetes in Petersen hauls (19), of isopods under boards (9), and of the mite Liponysus 
bacoti on rats in Savannah (10). 


Individuals Microcalanus Tanytarsus | Oligochaetes Tsopods Mites 

per unit Ney 
z i ? ? zi ? f ? f ’ if ¢ 
0 a 8 32 29.5 39 34.7 28 30.2 | 160 160.0 
1 2 .8 1,9 28 32.5 24 29.6 28 21.6 19 15.9 
2 4 2.1 3.7 25 29.0 118 23.6 14 16.2 11 8.5 
3 3 4.3 5.8 34 23.9 21 18.3 a 12.3 6 5.8 
4 5 zis 8.0 13 18.9 15 14.1 Fs 9.4 5 4.3 
5 8 9.9 10.0 145) 14.46: £5 OT 11 7.3 4 3.4 
6 16 12.4 L156 17 10:.9 6 8.2 2 5.6 4 240. 
a 13 14.0 12.5 5 8.1 8 6.2 3 4.3 3 2.4 
8 12 14.7 12.9 6 6.0 6 4.6 3 3.4 2 2.0 
9 13 14.5 Tee 1 4.4 P. 3.5 3 2.6 2 1.8 
10 15 13.5 11.9 9 3.2 1 2.6 3 2.0 1.6 
11 15 12.0 10.9 2.3 2 2.0 2 136 1.4 
12 9 10.3 9.6 up 3 1.5 1.2 us 1.2 
13 9 8.5 8.2 2 1.2 3 p ime | 1 9 Ey | 
14 7 6.8 6.8 a 9 8 2 ff 1.0 
15 4 5.2 5.5 6 1 6 1 6 2 9 
16 4 4.0 4.4 4 +1.9 4 8 
17 6 2.9 3.4 iL 3 2 3 8 
18 2 2.1 2.6 a 2 +1.4 1 7 
19 a ea 1.9 +.5 1 6 
20 2 iL as 1.4 6 +10.0 
21 1 att 
22 be “3... 5 

N, P(x?) 150 .89 64 189 .14 | 164 .53 | 122 .84 | 227 .59 


Other models have been developed for the negative binomial (2) and 
there is no reason to suppose that the possibilities have been exhausted. 
This is suggested in part by the variety of its applications to biological 
data. Some applications to fresh-water dredge samples (19) and to 
marine plankton (5) are shown in Table 6. It has formed the basis for 
a sequential sampling scheme for tapeworm cysts in whitefish (22). It 
has described effectively the distribution of insects in the field, in- 
cluding the beet leafhopper (8) and the wireworm (31), of ticks on 
individual sheep (12), of mites on rats (10) and of isopods under boards 
(9) (Table 6). Some failures in fitting can be ascribed to the inefficiency 
of the moment estimate of k. One of these is a count of Ribes on Mt. 
Spokane (14, 31), where Equation 3 gave k, = .134 and Equation 6 
gave k = .205. Even though the estimates did not differ significantly, 
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TABLE 7 
Distributions of quadrat counts with apparent bimodality (Obs. f) and their expected frequencies for 
the negative binomial (Bin ¢), Neyman type A (Ney #@) and Thomas double Poisson (Thom ¢) dis. 
tributions, representing three species in a salt marsh, Salicornia stricta, Plantago maritima and Ameria 
maritima (4, 29), and a weed on arable land, Chenopodium album (25). 

Plants per Salicornia Plantago Ameria Chenopodium 
quadrat | Obs. Bin Ney | Obs. Bin Thom] Obs. Bin Ney Thom]! Obs. Bin Thom 
z ee ey Sa Sl pee Be BP Pegs Oe 
0 t 3.2 107 12 HE cme A 57 54.1 54.9 56.4 19 9.2 19.0 
1 3 6.4 4.0 | 8 11.3 Gan 6 16.2 7.9 5.6 5 13.5 5.0 
2 8 8.4 6.5 9 i2.4 10.7 12 02.0" 205 2590)'6 6 14.3 9.7 
3 13 9.4 a) pe 12:0 1.2 5 5.8 9.0 9.6 9 13.0 10.6 
4 11 9.6 8.1 6 10.8 10.8 5 3.9 6.5 6.8 5 LTO 96 
5 9 9.2 7.9 8 9.3 10.0 5 2.8 4.3 4.3 20 8.8 8.3 
6 8 S42 7-6 11 7.8 8.8 7 QO 2 Bie 228 14 CrSiusee 
7 10 7.5 7.0 7 6.4 7.4 1 1.5 1.8 1.8 8 5.2 6.0 
8 3 6.5 6.4 8 5.1 6.0 Tcl Ld ye 4 3.8 4.9 
9 3 5.6 257 rs 4:0.) 4.7 1 8 9 sii 3 229 Lad 
10 8 4.7 4.9 3 3.2 3.6 1 6 4 4 2 205 23,0 
11 3 3/1 4.2 4 BID 2. +4.6 +7.9 

12 4 2:5 —3.6 i 9/7 2.0 $2.2 +.8 +.5 
13 i 2.1 2.9 1 1.4 1.4 
14 eve (2e: 
15 3 1.3 1.9 
16 1.0. 1.5 1 44.3 +3.0 
17 8 1.2 
18 t 6 9 
19 1 
20+ 3 +5.9 +2.9 
N, P(x?) 98 .48 17 | 100 14 .74 | 100 144” 53 .29 95 <.001 <.001 


the x’ test for the agreement of the observed and expected frequencies 
from the two estimates gave P = .017 and P = .25 respectively. 

Solely as a method for summarizing a set of observations with two 
statistics, one of them the mean, the negative binomial should be of 
increasing interest to biologists. An adequate fit of this distribution 
to the data may serve to justify further statistical analysis such as 
sequential sampling (22) or a transformation for stabilizing the variance 
preparatory to the analysis of variance. But since several quite different 
models might possibly underlie data which conform to the negative 
binomial, one cannot use this agreement as the sole basis for justifying 
a particular model or conclusions based upon it. 

Comparisons with Other Distributions for Over-Dispersion. Although 
the negative binomial is the easiest to compute and the most widely 
applicable of the distributions for over-dispersion, several others have 
been proposed. Some of these have two or more modes, while the nega- 
tive binomial has only a single mode. In fitting the negative binomial 
to observed distributions, any “extra” modes are assumed to represent 
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random variation, as in fitting the negative binomial to the distributions 
of corn borer for treatments 1 and 2 in Table 3, of Tanytarsus and of 
Microcalanus in Table 6, and of quadrat counts in Table 7. In some 
cases, this assumption worked well, in others passably and in a few 
cases badly. Three two-parameter distributions have been described 
which may have two or more modes, the Neyman contagious type A, 
the Thomas double Poisson and the Polya. Hach has been based upon a 
mathematical model of biological interest. 

The Neyman contagious type A distribution (21) assumes an initial 
population in which groups are dispersed uniformly, within the limits of 
the Poisson, over the area (or period) represented by the final counts. 
Individuals then move out from these random centers independently 
but at too slow a rate to equalize their dispersion over the entire area. 
As numerical examples, Neyman cites the distribution of corn borers 
following treatment 2 (Table 3) and Student’s haemocytometer counts 
of yeast (Table 2). In accord with theory, corn borer eggs are laid in 
masses and the larvae on hatching tend to migrate to neighboring corn 
plants. The Neyman expected frequencies, as fitted by Beall, are 
shown in Table 3. For treatment 2 they reproduced the observations 
better than the negative binomial but the reverse was true for the re- 
maining three treatments. In the Armeria counts of Table 7, the 
Neyman type A again reproduced the bimodality which was missed by 
the negative binomial, but when fitted to unimodal distributions, the 
negative binomial did as well or better (Tables 2, 5, 6, 7). Fracker and 
Brischle (14) fitted the Neyman type A curve to six series of Ribes 
counts. None of them approached agreement but five of the six agreed 
well with the negative binomial when computed by method 1 by Wadley 
(31) and the sixth agreed when fitted efficiently as noted above. Despite 
the advantage of a potentially multimodal curve, the range of distribu- 
tions fitted by the Neyman curve seems to be more restricted than with 
the negative binomial. 

The Thomas double Poisson distribution (29) is also potentially 
multimodel with peaks that are somewhat more sharply defined than in 
the Neyman type A. It assumes that the number of plants per quadrat 
can be broken down into two Poissons, one of the number of cluster 
centers and the other of the number of additional plants (after the first) 
in a cluster. Thus individual plants of Plantago maritima tended to be 
grouped, possibly because their inflorescences are short compact spikes 
which frequently fall to the ground with the seeds still in the capsules 
(4). Frequencies computed by moments for this and other series with 
the Thomas distribution are given in Tables 5 and 7. They closely 
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resemble the expectations for the Neyman type A and have similar 
advantages and limitations. 

Under certain conditions, the Polya distribution (2) may have two 
modes with somewhat larger frequencies at x = 0 than in the negative 
binomial, which in other respects it closely parallels. Under certain 
conditions, the Polya distribution might represent the number of in- 
dividuals per quadrat in a growing population better than the negative 
binomial (2). If, for example, the original progeniters were released all 
at the same time rather than continuously over a period, the Polya 
distribution would be indicated, but if the individual rates of birth and 
death were constant and immigration occurred at a constant rate, the 
negative binomial would be more appropriate. So far as most biological 
distributions are concerned, the Polya seems to be a minor variant of 
the negative binomial. 

A quite different. alternative is provided by Fisher’s logarithmic 
distribution (13). In studies of species, area and abundance, this has 
been applied effectively to the number of species (f) represented by 
x = l, 2, 3, --- individuals, in the catches of insect light traps for 
example (33). If k in a negative binomial approaches zero and we omit 
the zero class, the observed frequencies can be considered a sample 
from a logarithmic distribution. Williams (33) has fitted the logarithmic 
distribution to Buxton’s data on the number of Hindu male prisoners 
in a south Indian jail with x = 1, 2, 3, --- lice per head, omitting the 
612 individuals (see Anscombe’s correction, 2) who were free of lice. 
The observed frequencies agreed far better with the logarithmic ex- 
pectations than with those computed from the negative binomial by 
method 1. In this instance, however, method 1 has an efficiency of less 
than 50 percent and method 2 and efficiency of nearly 98 percent. 
When recomputed with @ = 6.9357 + .5634 and k, = .144198 + .008195, 
the divergence between the expected and observed frequencies (‘Table 8) 
was well within the sampling error (x” = 31.74, n = 32). Neverthe- 
less, the observed second and third moments were significantly larger 
than their expectations, with U = 243.3 + 38.9 and T = 26488 + 2048. 
Yet, k. was significantly larger than zero, its expected value for a 


logarithmic distribution. 


Reexamination of the data showed that four of the 1,073 prisoners 
had more than 200 head lice. When k, was recomputed without these 
individuals, the second and third moments were no longer discrepant, 
with U = 22.86 + 25.68 and T = 575 + 3297. This example indicates 
the disproportionate effect upon T and U of a very few large values of x. 
With these omissions, the expected frequencies agreed still more closely 
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TABLE 8 
The observed distribution (f) of lice of all stages on the heads of Hindu male prisoners in Cannamore, 
South India, in 1937-39, and the frequencies computed with the negative binomial (Bin) from all of 
the data ($), and from all except four prisoners having more than 200 lice (f’), compared with the 
frequencies computed for Fisher’s logarithmic distribution (Log @) by omitting the 612 prisoners 
without lice (33, 2). 


Lice No. of heads Lice No. of heads Lice No. of heads 

per per per 

head Obs. Binomial Log | head Obs. Bin Log head Obs. Bin Log 
x ip p q¢’ co) x ff co) co) x a i) ’ 
0 612 612.0 612.0 11 3 9.6 8.4 22-23 8 8.3 7.0 
1 106 86.5 90.5 107.2 12 10 8.7 ene 24-25 r 7.4 6.2 
2 50 48.5 50.8 52.8 13 8 8.0 6.9 26-27 10 6.6 5.6 
3 29) 33 0 woDRouN etna 14 6 74 6.3 28-29 6 6:0.) 5.0 
4 33 26.1 27.38: 25,6 15 3 6.8 5.8 30-32 2 7.9 6.7 
5 20 2142 ©2212. 22:2 16 6 6.3 5.4 33-35 z 6.9 5.9 
6 14 17.8 18.5" 16.6 17 7 5.8) 0 36-38 9 6 Diese 
7 12 15.3 15.8 14,0 18 4 5.5 4.6 39-41 5 5.3 4.6 
8 18 13.4. .1378 12.4 19 7 5.1 4.3 42-45 5 6.1 5.3 
9 11 LIS O) 1252 101-6 20 7 4.8 4.1 46-49 7 5.2 4.6 
10 11 LOT LOR. Se 21 3 4.5 3.8 50+ 27° «87.5 oto 


with the observed values, as evidenced by the expectations for 7 = 0 
to 10 in Table 8. The agreement of the negative binomial expectations 
with the observed frequencies of mites on individual rats in Table 6 
and of ticks on sheep as given by Fisher (12) suggest a greater usefulness 
for the negative binomial than for the logarithmic distribution in 
studies on ectoparasites. 

Fitteng a Single k to Several Negative Binomial Distributions. The 
analysis and interpretation of most biological data are facilitated by 
stability inthe variance, so that only means need to be compared. A 
similar stability in the coefficient & would both increase the utility 
of the negative binomial and increase our confidence in its suitability 
for a given problem. The assumption of stability is essential in some 
cases as in the sequential sampling of whitefish for tapeworm cysts (22). 
By a simple expansion of the maximum likelihood solution, a combined 
k, can be computed from a series of distributions and their homogeneity 
in respect to k tested by x’. 

The calculation consists of computing the score z for each component 
distribution with the same trial values of k’ and adding the scores for 
each k’ over all component distributions. Trial values are selected until 
two sums of the scores, S(z), are obtained which closely bracket 0. By 
interpolation between them, the required estimate f, is that for which 
S(z) = 0. If these sums are designated as z, and 2, from corresponding 


ae Oe 8 et = 
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trial values of ki and ki , the error variance of k, may be computed by 
equation 10. 


The homogeneity of the k’s in the component distributions depends 
upon the z; and z, in each individual series for ki and Chere io 


z3(ki = ks), ‘@, = 2s) (18) 


is computed from each component distribution and from the totals over 


TABLE 9 


Calculation of a combined k. by maximum likelihood from the four distributions of corn borer in 
Table 3. N In(10) = 276.310212. 


Calculation of score with Eq. 6 for -003 232 
Treatment Term 
No. k= 455 ke’ = 1.4 he A AT kid Aza 23 — 24 
1 S{Az/(ki + 2)}- 156 .6745 164.1457 158.8287 158.6100 
N In(1 + £/k’) 156 .6387 162.7297 158 .4110 158.2317 
25 .0358 1.4160 4177 .3783 .0133 
2 S{Az/(k’ + 2)} 138 .3464 145.2368 140.3318 140.1302 
N In(1 + 2/k’) 136.1976 141.8773 137 .8480 137 .6810 
2 2.1488 3.3595 2.4838 2.4492 .5349 
3 S{Az/(k’ + 2)} 81.5615 86.2613 82.9113 82.7742 
N In(1 + £/k’) 82.5085 86.6970 83.7206 83.5979 
Zs — .9470 — .4357 — .8093 — .8237 1365 
4 S{Az/(’ + 2)} 81.3606 85.9803 82.6881 82.5532 
N In(1 + £/k’) 83.5105 87.7329 84.7322 84.6083 
i —2.1499 —1.7526 —2.0441 —2.0551 1.1395 
Total Ratios 1.8242 
S(z) — .9123 2.5872 .0481 — .0513 .0001 
x? = 1.8241 


By interpolation &. = 1.47145, V(é-) = .003/(.0481 + .0513) = .03018 by Eq. 10. 


all distributions. The sum of the ratios computed from the g individual 
distributions for ki and ki, is diminished by the ratio computed from 
the corresponding totals of z; and z,. The difference is x’ for testing 
the homogeneity of k with g — 1 degrees of freedom. In solving this 
expression, it is immaterial which boundary value, 23; or % , is used, 
since the same x’ is obtained with either one. ee 

The procedure can be illustrated with the four distributions of 
corn borer in Table 3, representing four different treatments in the same 


196 BIOMETRICS, JUNE 1953 


field. The calculation requires for each treatment the mean < and the 
accumulated frequencies A, corresponding to those in the third column 
of Table 1. Starting with a trial value of kf = 1.5, the calculations in 
Table 9 gave S(z) = —.9123, so that a smaller trial value, k; = 1.4, 
was selected next, giving S(z) = 2.5872. By interpolation between 
ki and k for S(z) = 0, ki = 1.47 and from the resulting S(z) by further 
interpolation, k, = 1.473. Interpolation between k; and k{ gave the 
maximum likelihood estimate k, . Testing the homogeneity of k over 
the four treatments required the ratios in the last column of Table 9, 
from which x? = 1.824 with three degrees of freedom. We conclude 
that the k’s for the four treatments did not differ significantly and 
could be represented by a single value of k. = 14715 + abe ye 

Summary. In analyzing biological counts for which the variance 
is significantly larger than the mean, the value of the negative binomial 
distribution is enhanced by a simplified maximum likelihood method 
for estimating the coefficient k. The calculation is illustrated in detail 
with a numerical example and compared with other estimates of the 
same parameter. The error variance of the statistics of the negative 
binomial and tests for its agreement with a set of observations are 
illustrated with the same example. Models underlying the negative 
binomial are reviewed with reference to observed distributions of both 
plants and animals. A comparison with other distributions for over- 
dispersion suggests that the negative binomial is the most widely 
adaptable and generally useful of those that have been proposed so far. 
Finally, the maximum likelihood estimation is extended to the calcula- 
tion of a single coefficient k from a series of similar distributions and 
the testing of their homogeneity by x’. 
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Nore ON THE EFFICIENT FITTING OF THE 
NEGATIVE BINOMIAL 


R. A. FisHer 

When it is desired to examine the representation of data having a, 
counts of x, for values of x from 0 upward, by means of the negative 
binomial distribution, in which the expectation of a, 
N (k an “ame 1)! Dp 

ig Se Lam OE Se 
is expressed in terms of two parameters p and k, it is well known that 
the equation of estimation based on the mean aie Be: 


E(a,) = 


pk = & Ads Se a 
is fully efficient. ay 
A second equation, with efficiency varying with the cinoumstances, 


may be taken from the second moment or variance 


pp+lk=s 
or, among other ways, from the frequency of zeros < ae 
(+2) = N/a ee 


In 1941, the author gave a number of rules (12, p. 185) for indeing ; 


XM vn nscombe 
b: ncy both 
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whence 
sta, 2 (log m)} = Stea,) — 7 5 Sta.) 
op fae 7 pl + p) 
= ee ayia 
If, therefore, we choose p nae that 
p = £/k 


the likelihood will be maximized for variation of p. 
The second equation for maximum likelihood is derived from 


5 (log m,) = F(k + © — 1) — F(k = 1) = log (1 +9) 
where F(z) stands for 
d 
ae log (2!) | 


and 


FQ) — Fe — Ive. 


The efficient score for k is therefore = 


sa. 2 (los m2} ag r 


: 
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as usual, the rate at which the score is decreasing as it passes the zero. 
Hence, the sampling variance and the standard deviation of the estimate 
may be calculated (p. 182). 
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THE FITTING OF MULTI-HIT SURVIVAL CURVES 
A. W. KIMBALL 


Oak Ridge National Laboratory 


1. Introduction. 


Because of the rapid growth of research in the field of atomic energy 
for both military and civilian uses, biologists are placing more and more 
emphasis on the study of the effects of radiation on living organisms. 
The use of microorganisms in radiation studies has increased steadily 
since they are in general easier and cheaper to work with than higher 
forms of life and since the results of such research may be used to define 
areas of investigation with more expensive material. For the most part 
data from experiments with microorganisms are survival proportions of 
cultures exposed to different experimental conditions, frequently varying 
amounts or intensities of radiation. For this reason the interpretation 
of survival curves plays a very important role in the field of modern 
radiobiology. 

Among research workers who have realized this fact are Atwood and 
Norman (1949) who have discussed rather thoroughly the theoretical _ 
approach to survival curves from several different points of view. 
Graphical methods of estimating parameters have been suggested by 
these and other authors, but the problem of estimating sampling errors 
seems to have been overlooked. The purpose of this paper is to present 
some statistical methods for obtaining parameter estimates and their 
standard errors. Although the analysis of a single-hit curve is simple 
and straightforward, it will be presented first for completeness. 


2. The single-hit curve. 


If a population consists of organisms each having one sensitive unit 
inactivation of which causes loss of viability, the probability that the 
unit is not hit when exposed to a dose x of radiation is assumed to be 
e ** where k is some positive constant independent of dose. It follows, 
therefore, that the expected proportion of a population surviving a dose x 
is equal toe”. If several cultures of the same population are exposed 
to different doses x; (¢ = 1, --- , p) and the observed proportions surviv- 


ing are S; , the equation* 


(1) y; = log S; = —ka; + « ; 


*Throughout the paper all logarithms are taken to the base e. 
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where ¢; represents the amount by which y; differs from its expected 
value, provides a simple method for estimating k. If the ¢; satisfy 
certain assumptions (see section 4), an estimate of k is given by 


= D Pp ri 
= =) ty Dy ee 3 
i=1 i=1 
and the variance of f is estimated by* 
si = aes ley 


where 


2 1 2 73 
8 serena Mai! +k >) cy). 


TABLE 1 
FITTING OF A SINGLE HIT CURVE, EQ. (1) 
[Data from Lea, Haines and Coulson (1936)] 


Proportion — Dose in minutes 
surviving re of exposure 
(S) (y) | (@z) 
90 —0.105 Ly 
84 —0.174 F 2 5s 
80 ~ —0.223 4.9 
.69 —0.371 0.5) = ee 
58 —0.545 “1A \ 
AT 0.755 Prey ee ~S 
ae : meee 55.6 
208s ude | —2.813 | 85.7 


2 


atin 
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In Table 1 the method is illustrated with data from Lea, Haines and 
Coulson (1936). Spores of B. mesentericus were exposed to the beta rays 
of a radon source for varying lengths of time. Different numbers of 
loops were exposed at each dose in an attempt to keep the errors uniform. 
The estimate & = .0321 when expressed in reciprocal seconds is 
5.35 X 10°-* which agrees well with the value 5.25 « 107+ given by the 
authors. 


3. The multt-hit curve. 


Frequently populations consist of organisms each having n sensitive 
units all of which must be inactivated before the organism will lose its 
viability. If the hits are independent and if k is the same for each unit, 
the probability that all n units are inactivated is (1 — e “*)". Accord- 
ingly, the expected proportion of a population surviving a dose x is 
1 ee (1 a ey. ‘ 

One method for fitting survival curves based on this model has been 
used widely. For large doses 


(2) ee, lee) 


so that in an experiment resulting in observed proportions surviving S; 
over a range of doses x; (¢ = 1, --- , p), the equation 


(3) y; = log S; = logn — kx; + ¢; 


leads to a method for estimating n and k. In practice the data are plotted 
on semi-logarithmic paper and only those points at large doses which 
appear to lie nearly on a straight line are used in fitting (3). If the 
assumptions about e; are correct, estimates of k and n are given by the 
familiar formulas 


k 


Sie ayo) dy @ Pp 
logiz =got+ he 
where £ and @ are arithmetic means. The variances of & and log # are 
si =8'/>> (x — 2)” 
Sop 0. eis a Sn) 


2 
Slogi 


where 
f= 5D w- a +k Le - ay - av) 
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TABLE 2 
FITTING OF A MULTI-HIT CURVE, EQ. (3) 
[Data from Pomper (1952)] 


Proportion (he's X-ray dose in 104 
surviving roentgens 
(S) (y) (x) 
86 
64 2 
Wea. PAS BES ee eee eS i ed ae 
072 —2.631 6 
020 —3.912 8 
0056 —5.185 10 
0020 —6.215 12 
00045 —7.706 14 
Se) = 70,00 = = 9.00 
— Ly — p? = 26.158162 g = —4.535 
Li (@ — ay — 9) = —42.75 > 2° = 556 
; “1 


& = —(—42.75/70) = .610714 Es 
log A = —4.535 + .610714(9.00) = .9614 


t= 2.62 
mR OL At 7? = 326. ee ON ee B 
| * = = 01254 See sia 
| tess = (01254)(856)/6(70) = 0108 
| ; a Wa 
=> aie} > : : . { i ; 
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Although at first glance this technique seems satisfactor y and appeal- 
ing because of the simplicity of the calculations, closer scrutiny reveals 
that n and k are actually very poorly estimated. Since the organism is a 
diploid we know that n = 2. But if this information were not known, 
the 95% confidence interval for ”, which is (1.8, 3.7) would tell us only 
that it is a diploid or a triploid, pohly even a tetraploid. Furthermore, 
as the number of sensitive units per organism increases, the confidence 
intervals computed from this method become progressively worse. 
Actually the poor quality of the estimation is only a reflection of the 
well known fact that confidence intervals for a regression line are ex- 
tremely wide outside the range of the observed data. In other words the 
gain in simplicity achieved by virtue of approximation (2) is more than 
offset by an increase in the errors of estimate. The same argument 
without modification cannot be applied to the estimation of k by this 
method, but as it turns out in this example, k is also poorly estimated. 
Clearly, some approach which permits the use of all points on the curve 
must be used. 

As an alternative to (3) we may write 


(4) u; = log (1 — S) =nlog (1 —e) +, 


which involves no approximations. The estimation procedure requires 
the minimization of the quantity 


V = > lu; —nlog (i — ee) 
t=1 

for variations in n and k. The equations for # and & so obtained are 
non-linear and in general must be handled by iterative methods. How- 
ever, for most experiments, a short-cut method is available which ordi- 
narily will give reliable results. Stepwise the method may be described 
as follows: 

1. Select a trial value of k, say ky , which may be obtained from a 


plot of the higher dose points. 
2. Find the value of n, say mn» , which minimizes V for k = keeitas 


= Lw/ Le 


where 
—ko De 


= log (1 —e 


3. Compute Vain = = ur — MH D, WD. 
4, Repeat the process for two or three more trial values of k, chosen 


so as to bracket the absolute minimum of Vnia. 
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5. Plot each Vinin against its corresponding trial value of k and inter- 
polate to obtain the value of k which makes V an absolute minimum. 

6. This final value of k and the corresponding n computed from the 
formula in step 2 are the desired estimates k and n. 

In selecting an initial trial value of k, it should be remembered that 
the estimate of k obtained from the semi-logarithmic plot of the higher 
dose points will usually be too large. In Pomper’s data the estimate 
was 0.61, so the initial trial value chosen was 0.57. The computations 


TABLE 3 
FITTING OF A MULTI-HIT CURVE, EQ. (4) 
[Data from Pomper (1952)] 


Proportion log (1 — S) log (1 — e7:57) 

killed 

1-S w Re 1 — e Siz v 
.14 —1.9661 sO .4345 — .8336 
.36 —1.0217 1.14 . 6802 — .8854 
19 —0.2357 2.28 .8977 — .1079 
.928 —0.0747 3.42 . 9673 — .0333 
. 980 —0.0202 4.56 9895 — .0105 
.9944 —0.0056 5.70 . 9966 — .0034 
9980 —0.0020 6.84 .9989 — .0011 
99955 —0.0004 7.98 .9997 — .0003 


>> w? = 4.971023, > vo? = 856210 >> w = 2.060758 
Mo = 2.060758/.856210 = 2.406837 
Vinin = 4.971023 — 2.406837(2.060758) = .01111 


RESULTS OF SUBSEQUENT TRIALS 


k Las in 
.55 .00931 
.50 .00938 


for steps 2 and 3 are shown in Table 3. Note that the quantity, >> u7, has 
to be computed only once, since it does not involve k. Complete calcu- 
lations are given only for k = 0.57. The results for two more trial 
values are given at the bottom of the table. The points are plotted in 
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5 
Virin® 10 


FIGURE L GRAPHICAL ESTIMATION OF K 


Figure 1, and the curve drawn free hand through the points indicates an 
absolute minimum at about k = 0.525. Using this as the final & we find 
> vo? = 1.0065, >) w = 2.2349 and’ = 2.2205. These estimates were 
found to satisfy the true estimation equations to three significant figures. 

The computation of standard errors for estimates 7 and & obtained 
from model (4) is complicated by the fact that the estimates are not 
linear functions of the wu; . In this case the only practical method of 
obtaining error estimates is to compute the asymptotic variance-covari- 
ance matrix. The formulas, derivation of which is outlined in section 4, 
are 


SSB ABI)! 
i = s'[A/(AB — C?)] 


2% 
I 


% 
=> 
I 
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where 
A= div 
i —hka 2 

Ba ye | ve | 

X (1 a 

ts 

cna ef me] 

Xu ad ee) 
= — [Sw —2 Dw] 

TABLE 4 


CALCULATIONS FOR STANDARD ERRORS OF # AND &, EQ. (4) | 


| ee Oy ay ke Hs) ieee, 6) 
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The calculation of A, B and C is carried out in Table 4. The columns are 
set up in an orderly fashion and B and C can be obtained from sums of 
squares or cross-products of the appropriate columns. Other tabular 
arrangements may be found more suitable for some computers. Finally 
the standard errors are found to be s, = .14, sg = .033. Although the 
number of degrees of freedom to be associated with s’ is not well defined, 
it is suggested that for purposes of computing confidence intervals, 
(p — 2) be used in conjunction with Student’s ‘“¢’”’ in the usual manner. 
When this is done the approximate 95% confidence intervals for /# and k 
are (1.9, 2.6) and (.44, .60), respectively. 

The computational labor in fitting model (4) is greater than in fitting 
model (3), but it is not excessive. In any particular case, the additional 
work needed to obtain more accurate estimates must be balanced against 
the consequences of possibly incorrect or misleading conclusions which 
might result from the simpler analysis. In the example just discussed, 
the simpler analysis might have led to the erroneous conclusion that the 
original culture was a triploid and would have overestimated k. Whether 
such errors can be tolerated will depend on the individual experimenter 
and the significance of a particular experiment. 

More general models have been discussed by Atwood and Norman 
(1949). The population may consist of organisms each with a different 
number of sensitive units. In Neurospora crassa, for example, popula- 
tions of macroconidia are found in which cells have different numbers of 
nuclei. In this case the method of fitting just described estimates the 
average number of nuclei per cell in the population. A further generality 
is introduced if the sensitivities of the n units in an organism are not the 
same, i.e., if each unit has a different k. The fitting of curves based on 
this model would necessitate the use of an iterative solution of estimation 
equations. Because of the length and complexity of calculations re- 
quired for such methods, their use is limited. 


4. Some comments on the theory. 


The statistical methods employed in the previous sections depend 
first of all on the assumption that the e; are independent normally dis- 
tributed random variables with zero expectations and a variance which 
is the same for all7. The truth is that they are independent and do have 
zero expectations, but they are not normally distributed and do not have 
uniform variances. In most cases the survival proportions are obtained 
from the ratio of two bacterial counts which suggests that the e; might 
behave somewhat like the ratio of two Poisson variates. Conceivably 
one could find a transformation which would render the variances nearly 
uniform and perhaps induce approximate normality, but it would likely 


- 
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have two disadvantages. It would probably do poorly at very high or 
very low survivals, the latter having occurred in the example just dis- 
cussed, and it would make the introduction of a specific model relating 
survival and dose even more difficult than it already is. The problem 
of heterogeneous variances is often handled quite adequately by the 
experimenter simply by increasing the number of plates or loops at 
doses which are expected to have e, with high variances. One never 
achieves perfect homoscedasticity by such procedures but they usually 
suffice for most practical purposes. As for the lack of normality, the 
logarithm of the ratio of two Poisson variates in which the denominator 
is determined much more accurately than the numerator might be 
expected to have a moderately symmetrical distribution and therefore 
not depart so far from normality as to cause any serious difficulties. 

The standard errors of / and & given in the last section are obtained 
from the asymptotic variance-covariance matrix which is derived in the 
usual manner. If the e; satisfy the assumptions in the previous para- 
graph, the probability of the sample is 


PRS) (+) exp = 5a ys fu; — n log (1 — ctypl 


and the likelihood is L = log P(S). Then the asymptotic variance- 
covariance matrix is the inverse of the matrix 


Fa Dy aL 
an? dn dk i A C 
= = -5 
oL el yi 
an dk OR nee 


where A, B and C are defined as in section 3. In practice, o, n and k 
must be replaced by their estimates, s’, A and &. This of course means 
that the standard error estimates are only approximate, and a further 
ambiguity is introduced when an attempt is made to estimate confidence 
intervals for n and k. One reasonably safe procedure is to associate 
(p — 2) degrees of freedom with s” and to compute confidenee intervals 
using Student’s “?”. It is difficult to determine the accuracy of such 
methods. A rough check, however, seems to indicate that they are 
conservative. If k is assumed to be known and equal to 0.525, the vari- 
ance of 7, which is then properly estimated, is 0.001396. The variance 
of 7 computed from the asymptotic variance-covariance matrix is over 
ten times as large as this, a fact which would seem to support the conten- 
tion that the approximation is on the conservative side, 
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5. Summary. 


The fitting of theoretical survival curves to data obtained from radia- 
tion experiments with microorganisms is discussed from a statistical 
point of view. Formulas are given for obtaining estimates of parameters 
and standard errors. The methods are illustrated with two sets of 
experimental data. ; Sa 

The author gratefully acknowledges many helpful discussions with 
Dr. K. C. Atwood and is indebted to Dr. S. Pomper for permission to 
use his hitherto unpublished data. 
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POPULATION GROWTH OF THE SEXES 
Lro A. GoopMaAn! 


University of Chicago 
1. INTRODUCTION 


Many authors (see, e.g., references 1-19) have noted that the sex 
distribution at birth is not equal (more males than females are born) and 
that the ability of the sexes to withstand the forces of mortality is not 
the same (the death rates at most ages are higher for males than for 
females). This phenomenon is not confined to man alone, but it also 
occurs, though not universally, among a number of other species. The 
literature contains many discussions of the importance and implications 
of these two facts. 

At a recent meeting of the Royal Statistical Society several speakers 
pointed out that the possibility of variations in the relative numbers of 
the two sexes has been too long neglected in population mathematics. 
D. G. Kendall [22] has discussed some of the characteristic difficulties 
of this problem and suggested some approximations which we shall 
extend. Some work on this problem, from a deterministic standpoint, 
has been carried out (see, e.g., references 1, 13-19). Kendall has men- 
tioned that the problem of expressing the modes of population growth 
for the two sexes in stochastic form would be a very difficult one. 
References. 20—26 deal with this problem when one sex is considered and 
their bibliographies give references to much of what has been written in 
the field. 

J. Yerushalmy, in a very interesting paper [1], describes the age-sex 
composition of the population resulting from natality and mortality 
conditions. If we are interested in the ultimate stationary population 
we may determine its over-all sex ratio from Yerushalmy’s analysis. 
In this paper we shall consider the problem of determining the over-all 
sex ratio for populations which need not be stationary and also the prob- 
lem of studying the population growth of each sex. 

We shall consider various mathematical models of population growth. 
In order to make possible an analytic treatment of the subject, these 
models necessarily will be oversimplifications of reality. 


1This paper was prepared in connection with research supported by the Office of Naval Research. 
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2. DETERMINISTIC MODELS 


Extending Kendall’s [22, p. 247] approach we have the following 
model: If M(t) and F(t) are the number of males and females respec- 
tively at time ¢, then these quantities satisfy the differential equations 


i 
dM __om + A(M, F) 
dt 

= —bF + A(M, F) 


where a is the intrinsic male death rate per male per unit of time, b is the 
intrinsic female death rate per female per unit of time, A(M,F) and 
A’(M,F) are functions of M and F representing the contributions from 
the male birth rate and female birth rate, respectively. (Kendall dealt 
with the case where a.= b and A = A’.) 

Consider the case where A(M,F) = uF and A’(M,F) = vF; ie., 
where the birth rates depend on the female population size F (females 
are marriage dominant, see [17], [18]). We then have 


dM 


ak —aM + uF 
= —bF +0F = (v — O)F. 


The solution of these equations is 


UA 
(@— b + a) 


F(t) = Ae“, 


where A and B are determined by the population composition at ¢ = 0. 
We have that the sex ratio is 


Mp a0 rr Beaittenel Patt) 


and the ultimate sex ratio is S = S(o) = u/[v + a — 6], when 
vt+ta—b>O0. We note that M and F tend jointly to infinity when 
v — b > 0, and to zero when v — 6 < 0. When the female death rate 
equals the female birth rate, v — b = 0, M and F tend to nonzero limits 
and the sex ratio approaches S = u/a the ratio between the male birth 
rate and the male death rate. 

Now consider the case where A(M,F) = uM and A'(M,F) = oM; 
i.e., where the birth rates depend on the male population (males are 


M(t) = daar — Peaw 
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marriage dominant). By applying the methods of the preceding case 
we may determine M(t), F(é), S(é), and we find 


So gta shes 
v 


when u —a+b>0. Also, M and F behave qualitatively in the same 
way as before. When u — a = 0, M and F tend to nonzero limits, and 
S = b/o. 

Let us consider the case where 


rat, 9 = CE 


and 


i.e., where the contributions from the birth rates depend on the total — 
-cisalbaiens size M + F (neither males nor females are 2 marriage domi- 


nant). We then have ; 
a -aM +%(M +m = (- —a + “y+ R= JM + cF 
dF 
oa en ome 
where , | i 
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and the ultimate sex ratio is 


S = S(o) = 2c/[g — f + hl] 


- ae an ee 


We could have determined that S was one of two values directly by 
noting that 


dS/dt = a(™) vi i= | Pai = MAP 7 dt 


~ | {MF + cF* — gFM — kM] 
ae ee a ae 
=(f —gS+cec-—ksS’. 

_ The solution of this equation for dS/dt = 0 is 


, ysoe See Ae Soe ee ee 
7 


= 2k — - 


| Moet Baral etic te ). 
(we have = Bk 7 ge i+ ies = § 

We note that M and F tend jointly to infinity when f + g +h > 0, 
and to zero when f +g + h <0. Whenf+g+h =0, M and F tend 
to nonzero limits and S = u/—2f = u/ (2a — u) = (2b — v)/v. The 
Eaten! relation between the constants is Z — 


tte Leave Vid he 2 i wae 
ae Bit dhe! Pecpsl sn Piste dS 0)? ch Aes Sate 
ies) 206-2) es. — bien tn fg = he, anche al ygepslalaiae 
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Writing VM = X, VF = Y, 
ax 


2X rhea =—ax’ + ux VY 
aX = fx +eY 
and 
2Yy c ah VY? ais 
dY 


where —a/2 = f, —b/2 = g, u/2 =c,v/2 = k. Hence using the results 
for the preceding model, we can compute X(é), Y(), X°() = M(d), 
Y’(t) = F(t), and S(t) directly. We find that 


S = S(~) = {2c/[g — f + h]}* 


~{e/[*et Nee) +f. 


Also M and F behave qualitatively in the same way as before, the critical 
relation between the constants being 


fg = ke, or ab = w, 
in which case 
S = [u/—2f} = [u/a]’ = [b/y. 

It is interesting to note that in all the models we have considered 
heretofore, when the population size tends to a nonzero limit, the sex 
ratio is what-one would expect of a stationary situation; i.e., S = bu/av, 
which equals w/a when v — b = 0, and b/v when u — a = 0, and (b/v)” 
when ab = wv (in the preceding model). 

Consider now a model which discriminates between married and 
unmarried persons. Let M, F, and N denote at time ¢ the number of 
unmarried males, unmarried females, and married couples, respectively. 


Then a natural generalization of the preceding models is governed by the 
equations 


“7 aM + uN + nN — K(M,F) 


Fite = OF aN A= mnN — UM) 


= (m+ MN + K(M,F) 
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where a is the unmarried male death rate per unmarried male per unit of 
time, 6 is the unmarried female death rate per unmarried female per unit 
of time, wu is the male birth rate per married couple per unit of time, 
v is the female birth rate per married couple per unit of time, m is the 
married male death rate per married couple per unit of time, n is the 
married female death rate per married couple per unit of time, and 
K(M,F) is the marriage rate per unit of time (the age distribution is, of 
course, being neglected throughout). 


In the case where, K(M,F) = cF (female marriage dominance), the 
equations reduce to 


or = —aM + dN —cF 
“ =gF+kN  - . * : 
a = iNeed we ee 
eee epee oo OT Sm The 
solution of these equations i ig 2 
Nig = Acre + oe ee Me 
| ins | * an tite 4 
‘ FO = (= a sacri st Be fee pat ; 2 i 
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may be evaluated. We find that 


S'(o) = 2d + f —g — h)de/If +g +h + 2allg — f +h) 


Ke Bde Ne [4 i|. 
s(o) = | (ALOR) 2c aN 

We also note that M and F behave qualitatively in the same way as 
before, the critical relation between the constants being 


and 


fg=ke, or (m+nj(b+c) = + mie, 
in which case, 
S’(o) = —d + fe/af = —(d + f)g/ak 
(u — m)(b + c)/aw + m), 


and 


_@d@+ft+a(m+nb+o 
(» + mate — f) 


=(u-m+ay(b+om+n)/av+ mece+ m+n). 


We see that the sex ratio differs from what one obtains in a simple 
stationary situation. 

A similar analysis could be carried out for models which discriminate 
between married and unmarried persons and where the males are mar- 
riage dominant (K(M/,F) = cM) or where neither sex is marriage domi- 
nant (e.g., K(M,F) = eV MF or K(M,F) = c(M + F)/2). 


Slo) = (d+ f th aje/ale —f) 


3. SIMPLE STOCHASTIC MODELS 


Let us consider a slight generalization of a stochastic model described 
by Moran [25]. If B(¢) is the total number of males, and G(é) the total 
number of females born since time ¢ = 0, and if B(t) = c, and G(t) = g 
at time ¢, the probabilities at time t + dé are given by 


Pr {B(t + dt) =e +1} =udt + o(dt) 
Pr {B(t + dt) = c} = 1 — udt + o(dt) 
Pr {G(t + dt) = g +1} =v dt + o(di) 
Pr {G(t + dt) = g} = 1 —vdt + o(dd, 
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where B and G are independent. The distribution of length of life may 
be of any general form and we suppose that the average length of life for 
the males is Ly, = 1/a, and the average length of life for the females is 
Ly = 1/b. Then it is not too difficult to see that if M(t) is the expected 
number of males alive at time ¢, and F(t) is the expected number of 
females alive at time ¢, then the sex ratio will ultimately be 


M(t) = UL ix 


S(o) = lim ie a bu/av. 
This result is the same as that of the corresponding deterministic model 
dM 
T= —-aM+u 
dF 
di —bF +0 


Now consider the following process: In order that the total population 
size remain constant, when a person dies there is a “replacement”? made 
(a birth takes place) which will be either a male or female with chances 
u/(u + v) and v/(u + v), respectively. The chance that a given male 
will die in a unit of time is a, and b is the chance that a given female will 
die. We might now consider a given male and study its life span and 
also the life span of its “‘replacement,”’ and the life span of the “replace- 
ment’s replacement,” etc. The probabilities at the tth unit of time are 
given by . 


U 


U 
D mag 


a 
Utuv 


Pr,{M} = Pr..{M}(1 — a) + Pr, i{M} + Pr,,{F} 
where Pr,{M} is the probability that the “replacement” at the ‘th unit 
of time for a given male is a male, Pr,{F} is the probability that the 
“replacement” at the tth unit of time for a given male is a female, 
Pr,{M} + Pr,{F} = 1. The value of Pr,{M} may be computed 
explicitly as a function of a, b, u, v, and ¢. As ¢ approaches infinity, we 
find that 


v U 
Frat Mar = Pr.{F}b (u + 0)’ 
and, hence, 
Pra{M} _ bu 
PPAR” —@. 


We consider only the case where Pr,{M} = 1, since the more general 


220 BIOMETRICS, JUNE 1953 


case may be analyzed using only the special case. Since the total popula- 
tion size is a constant N, the expected number of males alive in the tth 
unit of time is N Pr,{M}, and the expected number of females alive in 
the tth unit of time is N Pr,{F}. Therefore, the overall sex ratio for the 
population will ultimately be 


S(o) = bu/av. 


The model we have just considered is a special case in the general 
theory of renewal. An excellent exposition of this topic appears in [26]. 
It may be shown that more general assumptions concerning the distribu- 
tion of length of life still lead to the same results when ¢ approaches 
infinity. 

The preceding simple stochastic models have been presented more as 
an illustration of the problem than as a solution to it. We shall now 
consider more complicated stochastic models. 


4, STOCHASTIC MODELS OF POPULATION GROWTH 


We shall now consider a stochastic process which is a probabilistic 
analogue of the deterministic model describing a population where the 
females are marriage dominant; i.e., where the birth rates depend on the 
female population size.’ Although the analysis which follows pertains to 
populations where the females are marriage dominant, the results ob- 
tained may be used in an obvious manner to analyze populations where 
the males are marriage dominant. 

The population under consideration behaves in accordance with the 
following rules: 

(a) the subpopulations generated by two co-existing individuals de- 
velop in complete independence of one another; 

(b) a female alive at time ¢ has a chance wu dt + o(dt) of giving birth 
to a male and a chance v dt + o(dt) of giving birth to a female during the 
following time interval of length dé; 

(c) a female at time ¢ has a chance b dt + o(dt) of dying and a 
male alive at time ¢ has a chance a dt + o(dt) of dying in the following 
time interval of length dt. 

Let the random variables M(t) and N(¢) denote the number of males 
and females, respectively, alive at time ¢. We see that the female popu- 
lation follows the rules of a simple birth-and-death process (ef. [22], 
p. 236), the mean female population size being 


N(t) =a yer 


1The author wishes to express his indebtedness to David G. Kendall of Magdalen College, Oxford 
and Princeton University, and Charles Stein of the University of Chicago for their assistance in the 
analysis of this stochastic process. 
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while 


: . es Laan ar te 
Var {N()} = ore pute 84 


when N(0) = 1. Since the subpopulations generated by two coexisting 
individuals develop in complete independence of one another, we need 
only multiply V(t) and Var{N(t)} by N(0) in the more general case 
where there are N(0) > 1 females alive at the initial time ¢ = 0. There 
will be “almost certain” extinction unless the mean female population 
size is increasing (v > b). Whenv = b and the mean female population 
size is constant, we still find that there will be “almost certain” extinc- 
tion. 

The behavior of the male population is somewhat more difficult to 
analyze since male births depend on the female population. The method 
of analysis will be described in the Appendix. We find that the mean 
male population size is 


M = Bes ee (»—b)t —at 
eee tue Soe | 
while, 
VV bag (v + bu? — x ence zen} 
yar {M(t)} ~ v—b+a) (v — b) EE ROE rs 
- Rt res 2u(v + 6) a 
ue Vvo—b+a) aw — b\w — b+ 2a) (oo = bray 


when M(0) = 0 and N(O) = 1. In the more general case where 
N(0) > 1 females are alive at the initial time ¢ = 0, the formulas M (t) 
and Var{M(t)} are multiplied by N(0), since the subpopulations generate 
independently, 5 

In the more general case where there are M(0) > 0 males at the 
initial time ¢ = 0, these 1/(0) males follow the rules of a simple death 
process; i.e., the chance that M,(t) from among the original M(0) males 
will be alive at time ¢ is 


Se a a 


—at 


where p(t) = 1 — q(t) =e Hence, 
M(t) = Mec 
and 
Var {M.@) =MOe “lh — a Fr 
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We then add M,(é) and Var{M/,(t)} to M(é) and Var{M(j)}, respectively, 
to obtain the mean and variance of the total number of males when there 
are M(0) males and N(0) females at time t = 0. These additions have, 
of course, no influence on the form of the solution when t >. 

We find that the covariance between M(t) and N(t) is 


Cov (é) = E{IM() — M@IIN(G — NW@]} 


_— _uvtd) Jerr! i ——} uly + 0) porn! 
~~ wv—b+a) \C a(v — b) ; 


v — b) a 


when M(0) = 0 and N(O) = 1. In the more general case where 
N(0) > 1, M(0) > O, then the covariance is multiplied by N(0). 

In the special case where the constants are equal (a = b = u=v=)d), 
we have 


I 


Ni) =NO), M(t) = NOl[L-—e™“] + MOc™, 
Var {N(d)} = N(0)2Xt, 
Var {M(t)} 


N(0)[2\t — 2+ 3e* —e ™] + MOe “ll —e™], 
Cov (t) = N(0)[2At — 2 + 2e™’], 


and the correlation coefficient p(é) = Cov(t)/WVar {M(t} Var {N(d)} 
between the number of males and the number of females in the popula- 
tion is seen to approach one as ¢ becomes large. 

The methods which we have used to analyze the stochastic process 
representing a marriage dominant (female or male) population, can be 
extended in order to analyze a stochastic process where the contributions 
from the birth rates depend on the total population size (neither male 
nor female is marriage dominant). That is, the population under con- 
sideration behaves in accordance with the following rules: 

(a) the subpopulation generated by two coexisting individuals de- 
velops in complete independence of one another; 

(b) an individual alive at time ¢ has a chance u dt/2 + o(dt) of repro- 
ducing a male and a chance v dt/2 + o(dt) of reproducing a female during 
the following time interval of length dé; 

(c) a female alive at time ¢ has a chance b dt + o(dt) of dying and a 
male alive at time ¢ has a chance a di + o(dt) of dying in the following — 
time interval of length de. 

Using the second method described in the Appendix we may obtain 
the moments and cross moments of M(t) and N(t), the number of males 
and females, respectively, alive at time t. The means, variances, and the 
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covariance of M(t) and N (2) satisfy a system of five differential equations 
with constant coefficients which may be solved to determine these 
moments explicitly. 

oe us consider the special case where the constants are all equal 
(a = b =u=v=)). Then the probability Pn,n(t) that at time ¢ the 
popul ee Suiting m males and n females may be determined explicitly 
when the initial conditions are P,,,(0) = P,,.(0) = 1/2. We find that 


m+n te a aay 1 iit 


form +n > 1, and 


Po.o(t) = At/(1 + At). Whence we see that there will be “almost 
certain” extinction. The probability P,,(é) that at time ¢ the population 
contains m males, and. the probability P/(é) that at time ¢ the population 
contains n females may also be determined explicitly. We have 


Pal) = Px) = (SM) 97 + 09 


form > land 
Pot) = Pot) = (1 + Ad) /(2 + Dd). 


We find that M(t) = N(é) = 1/2, Var{M(é)} = Var{N(é)} = (2+ 1)/4, 
Cov(t) = (2A¢ — 1)/4, and the correlation coefficient p(t) = (2\t — 1)/ 
(2d¢ + 1). 


5, APPENDIX 


In studying the stochastic process representing a population where 
the females are marriage dominant, the means, variances, and covari- 
ances of the male and female population sizes were given. The following 
method which was used to obtain the first and second moments could 
also be used to obtain higher moments: 

All moments for the female population size may be obtained directly 
from its distribution which is a geometric series with a modified zero 
term (cf. [22] p. 237) or from its moment generating function. All 
moments for the male population size may be obtained from its moment 
generating function g(z,t) = E{z“‘"}. We find that ¢(z,t) satisfies the 
differential equation 


(1) se 7 = (b — Ww) — &) + we — Ie" 
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and that (z,0) = z, when M(0) = 0 and N(0O) = 1. By differentiating 
(1) once and setting z = 1, we obtain a linear differential equation for the 
mean M(t). By differentiating (1) several times and setting 2 = 1, we 
obtain linear differential equations for the higher factorial moments of 
M(t). 

The method just described gives the moments of M(t) but does not 
give the cross moments of M(t) and N(#); e.g., the covariance. Another 
method might be used to obtain the moments and cross moments of M (t) 
and N(t) which serves as an independent check on the preceding method. 
The two methods might be used simultaneously to simplify computation. 
Let P,,.,(t) be the probability that at time t the population contains m 
males and n females. Then the infinite system of functions P,,,,(¢), 
m=0,1,2,---,n = 0,1, 2, --- , satisfy a basic system of ordinary 
differential equations (cf. [23], Chap. 17). By multiplying each differ- 
ential equation by an appropriate factor and then adding the entire 
system of differential equations, we obtain ordinary differential equa- 
tions for the moments; e.g., if we multiply the differential equation 
containing the term [0P.,.,,(¢)]/d¢ by m and then add the entire system of 
differential equations, we will obtain an equation containing the term 


- OP ats) 
», & ot 


which is in fact [9M (t)]/dt. We then have a differential equation in 
M(t) which may be solved directly. 
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ESTIMATION OF VARIANCE AND COVARIANCE 
COMPONENTS* 


C. R. HENDERSON 


Cornell University 


INTRODUCTION 


The theory of variance component analysis has been discussed 
recently by Crump (1946, 1951) and by Eisenhart (1947). These 
papers and, indeed, most of the published works on estimating variance 
components deal with the one-way classification, with ‘‘nested”’ classi- 
fications, and with factorial classifications having equal subclass 
numbers. Also most papers on this subject are concerned with what _ 
Eisenhart (1947) has called Model IJ; that is, all elements of the linear 
model save » are regarded as random variables. In the above cases, 
estimation of variance components is usually accomplished by com- 
puting the mean squares in the standard analysis of variance, equating 
these mean squares to their expectations, and solving for the unknown 
variances. These techniques are described in many statistical text- 
books. 

Unfortunately, research workers in some of those fields in which 
much use is made of variance component estimates are unable to obtain 
data which have the above described characteristics. This is par- 
ticularly true in those fields in which survey data must be used or . 
where, even in a well-planned experiment, the subclasses are of quite 
unequal size due, for example, to differences in litter numbers. Also, 


*Presented at North Carolina Summer Statistics Conference June 24, 1952. 
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Model II is sometimes not appropriate. Instead the data more ap- 
propriately correspond to what Eisenhart called the Mixed Model. 
For example, the data may represent several different years, and the 
year effects should be regarded as fixed rather than as random variables. 

It is the purpose of this paper to describe some methods for esti- 
mating variance components in the non-orthogonal case and to illus- 
trate the methods with a small sample of butterfat records made by 
cows resulting from an artificial breeding program. The three methods 
described are: 


1. Compute sums of squares as in the standard analysis of variance 
of corresponding orthogonal data. Equate these sums of squares 
to their expectations obtained under the assumption of Model II 
and solve for the unknown variances. 

2. Obtain least squares estimates of fixed effects, ‘‘correct’’ the data 
according to these estimates of the fixed effects, and then using 
the corrected data in place of the original data, proceed as in 
Method 1. . : 

3. Compute mean squares by a conventional least squares analysis 
of non-orthogonal data (method of fitting constants, weighted 
squares of means, e.g.). Equate these mean squares to their 
expectations and solve for the unknown variances. 


These three methods henceforth called Method 1, Method 2, and 
Method 3 vary greatly in computational labor. Method 1 is the 
simplest. Method 2 in many cases is only slightly more difficult. 
Method 3 is usually much the most laborious. Method 1, however, 
leads to biased estimates if certain elements of the model are fixed or 
if some of them are correlated. Estimates obtained by Method 2 are 
free of the first of these biases, but not of the second. Method 3 yields 
unbiased estimates, but the computations required may be prohibitive. 
The relative sizes of the sampling variances of estimates obtained by 
these three methods are not known. 


DESCRIPTION AND ILLUSTRATION OF METHODS OF ESTIMATION 


The Data 


In New York State most artificial breeding of dairy cows is ac- 
complished with semen supplied by the New York Artificial Breeders’ 
Cooperative, Inc. This cooperative organization has approximately 
60 bulls in service. The operations of the organization are conducted 
in such a manner that it is largely a matter of chance to which bull’s 
semen a particular cow is bred, This fact as well as the large number 
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of daughters sired by each bull make the production records of these 
daughters particularly suitable for studying the genetic differences 
among bulls and for estimating the magnitudes of other sources of 
variation in milk production records. Good estimates of these variances 
are needed in designing efficient testing and selection programs. 

The difficulties in estimating the pertinent variance components 
are typical of those faced by research workers in animal breeding and 
in other fields as well. The difficulties in the present example are due 
to the following causes: 


1. Several years’ data are involved and time trends are known to be 
important. 

2. The two major classifications of the data are sire and herd. The 
number of sires exceeds 100 and the number of different herds 
exceeds 2000. 

3. The number of observations per herd-sire subclass varies; the 
majority being 0. 


We have estimated from these data the pertinent variances. Both 
Method 1 and Method 2 have been employed and have yielded estimates 
essentially the same. A small sample of records is presented in this 
paper and the three methods of estimation are illustrated. 

Table 1 shows the number of first lactation butterfat records in 
each of the year < herd X sire subclasses and also the sum of the 
records for each of these subclasses. 


TABLE 1 
Year 
Herd Sire —— Total 
1 2 3 4 
1 1 38-1414 2- 981 5-2395 
1 2 4-1766 2— 862 6-2628 
1 3 5-1609 5-1609 
2) 1 1- 404 38-1270 4-1674 
2 2; 5-2109 5-2109 
2 3 4-1563 2- 740 6-2303 
3 1 38-1705 38-1705 
3 2 4-2310 2-1134 6-3444 
4 1 38-1113 5-1951 8-3064 
4 3 38-1291 6-2457 9-3748 
Total 7-2931 | 21-9983 16-6959 13-4806 | 57-24679 
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The Linear Model 


Let y,;;, denote the record made in the h-th year by the k-th daughter 
of the j-th sire in the 7-th herd. Suppose that the appropriate linear 
model representing these observations is 


Yuu =eMtath, +s; + (As) + erin 


SE ee b= Lys, Maes 
t=1,-:-,@q > > him = N 
A a i 
~ aaa SP Total number of filled subclasses = s 


u is common to all observations. a, is common to all observations in 
the h-th year, h; to all observations in the 7-th herd, and s; to all records 
made by daughters of the j-th sire; (hs),; is peculiar to all records made 
by the daughters of the j-th sire in the 7-th herd. Peculiar to each record 
is a random element é,,;, which is assumed to have mean zero and vari- 
ance o. . The assumptions made concerning the other elements of the 
model are described for each estimation method. 


Method 1* 


Method 1 can be used only if it is assumed that, except for yu, all 
elements of the model are uncorrelated variables with means zero and 
variances o2, 04, 02, G12, OF Gz. This is, of course, the Eisenhart Model II. 

The following quantities are computed: 


2 
2 te, 
nies ae ay iene 
we Whihoc 


Ae 


HS = yb CF = * = 


N37 


Dots in the subscripts denote summation. For example, 
= pe yD Dy Yniik + 
a 4) k 


Next the expectations of the above quantities are computed. Under 
the assumptions of Model II, the coefficients of » ? and the variances in 
these expectations are as shown in Table 2. 


7 


*This method was first suggested to me by Dr. S, Lee Crump, 


—————————E—EEEE7Eo—s 
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TABLE 2 

pe o oF o The 
Te N N N N N 
A N N Ky Ke Ks; 
HS Ni Ka, N N N 
H N Ks N Ke Ke 
S N K; Ks N Ks 
(CR N Ky Kio Ku Kyi. 


N, p, 4%, 7, 8 in the above table were defined in the statement of the 


linear model. K,, K., +--+: , Ky. must be computed as follows: 


fe ih » Ny. 
De hi 
sa » aoa 
| De Do miss 
Ener ye 
ye asi 
Ke= > ; 
cad Desi 
SS 
5 se => at 


K, 


Ks 


Ky 


Ki 


Demi 
» Nn, i 
> pre a 

a 


7 Nj 


>> n,../N 


Di ms./N 


> nei/N 


VARIANCE AND COVARIANCE COMPONENTS 231 


variances can be obtained by solving the resulting equations. The 
necessary expectations are derived from Table 2. To illustrate, 
(Among Years) = E(A — CF) = E(A) — E(CF). 

Computation of the K’s is facilitated by constructing from Table 
1 the following two-way tables of subclass numbers (Tables 3, 4, 5). 


TABLE 3 
6G6ue_6e———sSsae————e—S SS 


Year 
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Year Herd Sire 
1. 2931 1. 6632 1. 8838 
2. 9983 2. 6086 2. 8181 
3. 6959 3. 5149 3. 7660 
4. 4806 4. 6812 

Total 24,679 Total 24,679 Total 24,679 


poe the above totals and the totals in Table 1, 
4806" 


2931 
hen Sa = 10,776,451 
2 ; 2 
HS = oe ae ee are. = 10,970,369 
6632° 6812? 
H = 16 i ein Fe aa MRE 
258838", S18i,. 7660) = 
$= + Hay + “op = 10,776,278 . 
7 2 
CF = ae. = 10,685,141 
: ] 


The expectations of these quantities are presented in Table 6. 
The computations of these entries proceed as follows: 


2 2 2 ; ; 
From Table 3, See ee = 19.51 


From Table 4, Ky = Ct ee 


13? 13? + 87 
2h 


opie er 7 
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i ee . rh La eee ey Le 18” 
From Table 4, K. = is” ee STG Sy = ay 
uble ¢ aoa fs 30.33 
From Table 5, K; =~ fF ek: PSL 
20 
DaveteyG,u-12 0" . 
-+- + 50 = ales fail 
m2 9 2 r2 = 
From Table3, Kk, =! — met 16,05 
i 
= 167 + 15° 2 ‘ 
Remeber, are 1S = ier Se 
i 
bs y) 2 m2 2 
7s ESSE) Bay cages Colo = Sa Ss G3 
z 5? 2 ne 2 
From Table 1, Ky. = se = za (ped) 
TABLE 6 
ra a2 a, o Ths a. 
T 57 57. 57 57. 57. 57 | 11,124,007 
A 57 57. 19.51 | 39.22 | 15.10 4 | 10,776,451 
HS 57 37.85 | 57. 57. 57. 10 | 10,970,369 
H 57 21.49 | 57. 24.04 | 24.04 4 | 10,893,666 
Ss 57 20.330) eis.bt 67 18.51 3 | 10,776,278 
CF 57 16.05 | 14.93 | 19.11 6.19 1 | 10,685,141 


The equations to be solved are presented in Table 7. The first 


equation reads: 40.95 0; + 4.58 0; + 20.11 6; + 8.91 o;, + 34% 


—21.30 


91,310. 
TABLE 7 
o% o;, a; Ths o% 
A — CF 40.95 4.58 20 2ioh. 301 3 91,310 
H — CF 5.44 42.07 4.93 17.85 3 208 ,525 
S — CF 14.28 3.58 37.89 12.32 2 91,137 
HS —H—-—S4+CF 1.58 | —3.58 —4.93 20.64 4 —14,484 
T—A—HS+CF —4.58 | —20.11 | —8.91 44 62 ,328 
Ee eB pena 2 ee er ee 
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The solution to these equations is o, = 763, 0, = 4531, o, = 1587, 
o, = —164, o? = 2950. If o;, is set equal to 0, the solution is o; = 756, 
o, = 4468, of = 1542, 0, = 2952. These estimates, of course, have no 
practical value for p, qg, 7, and s are much too small for accurate estima- 
tion of the corresponding variances. The illustration of their computa- 
tion does, however, show that even with many different classes the 
computations are relatively simple. We have successfully adapted 
most of these computations to International Business Machines opera- 
tions. 

A difficulty with Methed 1 is that it may be inappropriate to regard 
the year effects as random variables. If these effects actually are fixed, 
the estimates of o; , o, , and o;, are biased. The estimate of o; may or 
may not be biased depending on how it is estimated. This estimate is 
biased if obtained from the equations of Table 7. If, however, o7 had 
been estimated from 


2 
arn Gee ON 
the within year < herd X sire subclass sum of squares, the estimate 
would be unbiased regardless of the assumptions concerning the a,, . 

It might be well at this point to state briefly a convenient procedure 
for finding the expected values of quantities like H, S, etc. Substitute 
for the y’s their corresponding linear models, and then remembering 
the assumptions concerning the elements of the model proceed to write 
out the expectations. For example, 


t 7 Nea; 2 Jal Nii; 
= ps Dd Bin.sia + Nii 


Ett Mgt iAy + Mesghs 1.558; + 1.2;(hS) 3 
te > Do enssal’ /Mess 


; 2 
D5 Dy Bln sig? + nisjai 
i i 
2 2 
os ea Se NpiiAy 9 Nishi + N? 538; -- n: :;(hs)3; 


oa pp dX Cniix + cross products all having zero expectation] /n.,; 


) 
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a ee a ee ee Nisa + Nesj30, + N20 + n?,,02, 
+ >) Dd ln. 


h k 
as iii 
r 2 h 2 AT 2 2 2 2 
= Nw + a3 ae lace eet N(o, + 7, +03.) + 80, 
1 | bk YS | 


Method 2 


The bias in estimating variance components due to the assumption 
that fixed elements of the model are random variables can be eliminated 
by using Method 2. At the same time the relative simplicity of Method 
1 can be retained. Method 2 involves estimating the fixed effects by 
least squares, correcting the data in accordance with these estimates, 
and then applying Method 1 to the ‘‘corrected”’ data. 

This method was ‘used by Hazel and Terrill (1945) on data which 
were orthogonal except for the fixed effects. Their estimates were 
biased for they assumed for computational purposes that, except for 
fixed effects, the expectations of sums of squares of corrected data are 
the same as the expectations of the corresponding sums of squares of 
the uncorrected data. Method 2 enables one to appraise this bias and 
to correct for it. 

Before we apply this method to our example, let us consider the 
general case. Suppose the linear model is 


(1) Ye= > Bite+e a= 1,---,N 
i=1 
The x’s are known. ‘The e’s are uncorrelated with mean = 0 and 


variance = o.. : 

If the b’s are all fixed, the least squares equations for estimating 
them are as shown in (2). The b’s are, in fact, not all fixed in the 
variance components estimation problem, but they can be estimated 
by least squares as a matter of expediency. 


Pp = N 
(2) as Cib; = Yi Ci; = ds LiaXja 

4=1 a= 

? 4 N 

> C.:b; = Ya Y,;= dX LiaYa 


i=1 
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It is sometimes necessary to impose one or more linear restrictions 
on the estimates in order to obtain a solution to equations (2). 
Now suppose that b, , --- , 6, are fixed and also that for all 7 = 1, 
es 
(3) E(b; — b)? = Ko 


It is not true that all least squares estimates have this property. For 
example, in our butterfat production example described in Method 1 


E(u — uw) # Kyo: 
Instead 
se 2 1 2 if 2 i 2 2 2 
E(u — p) peeing muah mach yay er 


Method 2 applies only to correcting data by least squares estimates 
for which (3) applies. It is not difficult to determine which 6’s qualify. 

Now the data are corrected as follows (in practice only certain linear 
functions of the observations need to be corrected): 


(4) So, = Me SF 3 bitin 


t=1 


Suppose that for 7,7 = s + 1, - --,r< pall; =Owheni #37. Let 


; Dine Ds Brahe 
Then compute (5). Note that s anal 
Das. a Nell 
ae ; a i~~d ; = =. ae ray eee Vie 
Py: prea ijt ed 7 


ae ae Tir 
edits pol tia asses Sus —— jp ee Alie 7 > 


Z, 


7 


VARIANCE AND COVARIANCE COMPONENTS 237 


yi : . AAt . ; <a 
C™ are elements of the matrix inverse to the matrix of Clty yea 
- , p), and 


= 
~y ) 
SS ( peel Css 


t=s+1 


Computation of (7) is simple if s is small and if least squares equations 
(2) can be rewritten as (8). This can be done in many cases. 


(8) rs Cy 5b; = Y; i= i RS OSS 
= 


Dp, Cub, + Cab; =; pe Tiny by eas ine 0 
j=1 
Note in these equations that for all 7,7 = s+ 1,-:-, p, (ia 0 


when 7 ¥ 7. When equations (8) prevail, C“’ and 6, a gr=ely , 8) 
can be computed from equations (9). 


(9) 2 Cibs = Yi GG =T,=-- 8) 


In equations (9) 


bee —- al Ds CLT Gat ( Py, 
t=st+1 
Y¥i=Y.— >) CuY./Cu 
t=s+1 
The least squares estimates of b, , --- , b, are the solution to equa- 
tions (9), and C’’ (1, 7 = 1, --- , s) are the elements of the matrix 


inverse to the matrix of coefficients in (9). 

Let us illustrate Method 2 with the data of Table 1. We shall now 
assume that the a’s are fixed. First the least squares estimates of the 
a’s are computed. This is done most simply by estimating them jointly 
with d;; = wu + h; +s; + (As);; . Thus the equations are reduced to 
the form of equations (8). Looking at Table 1 these equations are 


7a, + 3di1 + do op Odin = 2931 
and similarly for the other “a’’ equations 
34, + 24, + 5di. = 2395 


and similarly for the other “d’” equations 
Then by (9) these equations reduce to the ones shown in Table 10. 
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TABLE 10 
ay ay as as 
3.825 —3.825 0. 0. — 73.500 
—3.825 6.492 —2.667 0. 101.500 
0 —2.667 6.000 —3.333 41.333 
0 0 —3.333 3.333 — 69.333 


One restriction must be imposed before a solution is obtainable. A 
convenient one is @, = 0. Then the solution is @, = 12.08, @, = 31.30, 
dé, = 20.80, a, = 0. 

Inverting the matrix of coefficients of Table 10 with fourth row and 
column deleted, the C"’ pertaining to the a’s are obtained. These are 
presented in Table NE, 


TABLE 11 : 


; Now the data can be corrected for the @’s. For example, the cor- 

. _ rected tote for the subclass pertaining to herd 1 X sire 1 is 2395 — 
3(12.08) — 2(31.30) = 2296.16. The corrected subclass and class — 
totals are shown in Table EC 7in) ah toa aioey te AR AI 2.05 Olde <otRi 
oer *5. shige ven writs cold in & behtel i-engead i ota + 

iy pe: ee Seiad sates at ‘os De - § 
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Using the totals of Table 12 in conjunction with the subclass and 
class numbers of Tables 1 and 4, the corrected sums of squares are 
computed. Thus, H’ = (6366.36)°/16 + --- + (6556.86)?/17. These 
quantities are: 


HS’ 


10,016,791 S’ = 9,833,620 - 


H’ = 9,954,295 CF’ = 9,774,822 


Next the amounts by which the coefficients of o2 are increased in the 
corrected as compared to the uncorrected sums of squares are needed. 
Using (7), the P;; pertaining to HS’ are computed. Looking at Table 1, 


ete hee 
PurZrgts Sa ately) 


P, = 924 1B 30) _ 3.995 ‘c= peeee 


Table 13 presents the complete set of P’s for HS’ s 


TABLE 13 


ay 
825 
508 


14 


2.667 
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TABLE 14 
ay a2 a3 
1.159 2.207 1.504 
2.207 9.765 4.988 
1.504 4.988 6.624 


Multiplying these values by those of Table 11, the addition to o; 
in E(H’) = 16.54. 
The P;; for S’ are shown in Table 15. Referring to Table 4, 


e TiS 


Pu = 99) Tt ona 


etc. 


TABLE 15 


Then the addition to o? in E(S") = 2.450 (.936438)-F --- = 21.39. 
Finally the P,; for CF’ are ie 
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me TABLE 17 
2 | 9 9 
ia O}, o Oo}, Oo, 
HS’ 37 | 57 57 57. 32.53 10,016,791 
H ay 57 24.04 24.04 20.54 9,954,295 
S 57 18.51 57 18.51 24.39 9 ,833 ,620 
CF Peto 14.93 19.11 6.19 16.57 9,774 , 822 
The equations to be solved are presented in Table 18 
TABLE 18 
o a; Ths oe 
H' — CF’ | - 42.07 4.93 17.85 3.97 179 ,473 
S’ — CF’ 3.58 37.89 12.32 7.82 58,798 
HS’ — H’' — §S'’+ CF’ | —3.58 —4.93 20.64 4.17 3,698 


The estimate of o; can be obtained readily from the residual sum of 
squares after estimating the a’s and d’s, that is from 


Pas ee ati Reduction (a, 4ud;,)z 
h i i k 
De o> > sea aaeres 415194, 007, 
A a i k 
Reduction (a, , d:;) = 12.08 (—73.500) + 31.30 (101.500) + 20.80 


(41.333) + HS = 3149 + 10,970,369 = 10,973,518 


The residual sum of squares is therefore 11,124,007 — 10,973,518 = 
150,489, with expectation 44 0? . Consequently o? = 150,489/44 = 
3,420. Substituting o: = 3,420 in the equations of Table 18 and solving, 
the estimates of the variances are o, = 3792, 0; = 409, o;, = 243. It 
is not surprising that these estimates are different from the estimates 
obtained by Method 1. The sampling variances must be extremely 
large in both cases. 

Adapting Method 2 to a model with covariates is easy to accomplish. 
In some problems this would simplify the computations. For example, 
if many years were involved, the number of fixed elements in the model 
could be reduced by fitting a quadratic or cubic to years instead of 
estimating individual yearly effects as was done in this example. If 
the years exhibit no trend, the simplest procedure is to regard q’s as 


random variables and then to apply Method 1, 
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Method 3 


When it is computationally feasible, Method 3 is the most satis- 
factory of the three methods for estimating variance components. For 
one thing it gets around the difficulty of fixed elements in the model. 
For another, it yields unbiased estimates even though certain elements 
of the model are correlated. The manner in which interference by these 
correlations is eliminated is described subsequently. 

Unfortunately Method 3 is not likely to be computationally feasible 
in the non-orthogonal case unless the number of different classes is 
small or unless the design incorporates planned non-orthogonality and 
consequently the mean squares of the analysis of variance can be com- 
puted without solving least squares equations. In these two cases the 
expectations of the mean squares are easy to compute. For example, 
the analysis of the balanced incomplete block design is simple and so 
is the writing of the expectations of the mean squares. The basic facts 
needed for employing Method 3 are stated below. 

‘Let the linear model describing y, , the a-th observation be 


(10) Ya —- ys Dia =e €a 
i=1 


The z’s are known. The e’s have mean zero, are uncorrelated, and . 
have common variance oc, . For the present we shall not specify which 
b’s are fixed and which distributed. 


Now if bj11 , «+: , 6, are set = 0, the least squares estimates of 
b,, ++: , b, are the solution to equations (11). 
(11) BC + bCie + +> + OC = Yi 
Cp, a7 ae eo. Neh = Y, 
etc. 


In equation (11) 
N 
C3; = 3 Lialja 
a=1 


N 
Y; — “> LiaYa 


a=] 


The reduction in sum of squares due to 6, , --- 0,38 


(12) R(b, , ees) b,) a ae, bay 


t=1 
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But since 
= rs Cr, 5) 
7=1 


where C*’ are elements of the matrix inverse to the Cy, tmaatrix (7) 7 = 
i ee q), 


(13) R(b; ap Raa b,) =F oD i ONY iY; 
i=1 j=1 


Using (13), the expectation of a A , +: , 0.) is easy to write, but not 
necessarily easy to compute. 


@ @ 


(14) E[R(b, aes rr} b,)] = os > CVE(Y<Y;) 
Riactwallibe wade if the-fact that-(id) can'beanitten 
(15) E[R(,, --- , b)] = = >: C.,H(b:b) + DY CB(b.b) 


t=1 j=1 t=q+1 j=1 


=a sie =e ;H(b; bi) + q’o , 


i=qt1 f=atl 


where 
cay = YY CCC. + CuCn) when n ¥ 9, 
; i=1 j=1 
& i Lee whee > Cy ce (8ee14). Riad 
4=1 f=1 at 


} ' -> Sosa 


ey = 6 a. Th only 
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Therefore, the expectation of the residual sum of squares is 
(18) E[> yz — Rib , a0 pO) | 


D 


= | > y C;,,E(b,b;) + Nat | _— fos a C,,;E(b;b,) a2 p'a!| 


t=1 7 


= (N — plot 
Suppose that b,.: , --- , 6, are independently distributed with 
means = 0 and common variance o”. This variance can be estimated 
by equating R(b, , --- , b,) — R(bi , --- , bg) to its expectation. The 


expectation of this difference is seen by reference to (15) and (17) to be 


t=at+l1 j=at+ 


a9) YY Oy = A EO.D) + @ — ao? 


= aE, (Cis — Ando” + (po — 90a, 
Then using the estimate of o; arising from (18) an unbiased estimate 
of o” can be obtained by equating R(b, , --- , b,) — R(b, , «++ , b,) to 
(19). It will be noted that the assumptions made concerning b, , --+ , b, 
are of no consequence. 
Now we shall illustrate Method 3 with our data of Table 1. If one 
were to carry out the usual tests of hypotheses by least squares, the 
following sums of squares would be computed. 


Among Years = R(years, herd X sire subclasses) — R(herd & sire 
subelasses) 
Among Herds = R(years, herds, sires) — R(years, sires) 


Among Sires = (years, herds, sires) — R(years, herds) 


Herds X Sires = R(years, herd X sire subclasses) — R(years, 
herds, sires) 


Residual = >) >) >) DO visix — R(years, herd X sire subclasses) 
h a 7 k 


The last four of these quantities can also be used to estimate o? , 
o;,., and o,. If years were regarded as random variables, the first 
would be used to estimate o; . Our present assumption is that the year 
effects are fixed, however. 

According to (15) we need not be concerned with u and the a’s in 
the expectations since the expectation of each of the above reductions 
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contains the following quantity, 


Ne + 2 Do ny..um, + >> m..02 . 
h h 


This expression vanishes in the sums of squares, which are differences 
between two reductions. 

Aside from 4 and a, terms, the expectations of the pertinent reduc- 
tions are those shown in Table 19. 


TABLE 19 
Oo}, o; o, o 

Biase : 
R(years, herd X sire 

subclasses) N N N pt+s-—1 
R(years, herds, sires) N N Ky ptqtr-2 
R(years, herds) N KR. K; p+q-1 
R(years, sires) Ky, N Ks ptr—-l 


The entries other than the K’s in Table 19 are derived from (15) 
and (16). The K’s are computed by (14). The expectations of the 
sums of squares of the analysis of variance are presented in Table 20. 


TABLE 20 
8.8. o;, o os o 
Among Herds N — Ky 0 K, — Ks |}q—1 
Among Sires 0 N — Kk, | Ki —K; |r —-1 
Herds X Sires 0 0 I ReneS Oye homlt) ime ue 
Residual 0 0 0 N-p-—s+l 


The computations of the various reductions proceed as follows in 
our example. R(years, herd X sire subclasses) = 10,973,518 as was 
shown in the description of Method 2. To obtain R(years, herds, sires) 
the equations of Table 21 need to be solved. In these equations yp is 
estimated jointly with each a, , while h, and s; are set = 0. These 
three or some other set of three restrictions on the estimates are 


necessary. 
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TABLE 21 
ay a2 a3 a4 hy he hz S81 So 

a 7 0 0 0 33 1 0 Tf 0 2931 

do 0 Al 0. 0 6 3 ie 13 8 9983 

a3 O07 0 16 0 2 9 2 0 9 6959 

4 0 0 0 13 ‘i We Q 0 0 4806 

hy 5 6 Z is 16 0 0 5 6 6632 

hy 1 3 9 2 0 15 0 4 5 6086 

hs 0 “f 2 0 0 0 9 2 6 5149 

Si dG 13 0 0 5 4 > 20 0 8838 

So Q 8 9 0 6 ti 6 0 ilé 8181 

The solution is 
Gin=stAlA Vie thy a Glo os oe oe 
a, = 419.83 h,=—8.15 s = 15.09 | 
a; = 412.39 hs = 143.05 4 
a = 368.59 
R(years, se sires) = 414.77(2931) + --- + 15.09(8181) 


= 10,921,107. 


‘In order to compute R(years, herds) the s; and s,; rows and sola 
of Table 21 are deleted and ithe ‘resulting easton solved. The solu- 


_ tion is =e) 


es can ere hy = 11.35 


ey ¥ig= Pe 9 ae er ay 
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Si) = —6.75 Ss = 48.38 
R(years, sires) = 425.46(2931) + --- + 48.38(8181) 
= 10,800,679 


The computations of the K’s in Table 19 require inversions -of 
certain matrices. To obtain K, , the inverse of the matrix of coefficients 
in Table 21 is needed. This inverse matrix is presented in Table 22. 
The entries to the left of the diagonal are omitted since the matrix is 
symmetric. 


TABLE 22 


Now we need the coefficients of 03, in the expectations of squares 
and products of the right members of the equations of Table 21. These 
_ computations are facilitated by setting up Table 23. a 
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The coefficients of o?, in the squares and products of right members 
are the squares or products of appropriate rows in Table 23. For 
example, the coefficient of o7, in # yj... is 37 + 1? + 3° = 19. That 
in E(y,...ys...) is 3(2) + 1(3) + 3(5) = 24. The complete set of co- 
efficients is presented in Table 24, with entries to the left of the diagonal 
omitted due to the symmetry of the matrix. 


TABLE 24 

ay a2 a3 a4 hy he hs $1 $2 
an 19 24 0 0 15 4 0 43 0 
Ae 79 16 0 34 12 33 71 48 
as 58 26 12 49 12 0 49 
On 65 25 12 0 0 0 
hy 86 0 0 25 36 
he 77 0 16 25 
hs 45 9 36 
Sy 114 0 
Se 97 


Multiplying and summing corresponding entries of Tables 22 and 24, 


Ky 


19(.64297) + 2(24)(.41685) + --- + 97(.28088) 
38.29 


Calculation of K, requires the inverse of the matrix of coefficients of 
Table 21 with s; and s. columns and rows deleted. This inverse is 
presented in Table 25. 


TABLE 25 
ay a2 a3 a4 hy he hs 
a .17524 | .03588 | .03770 | .03027 | —.06049 | —.04552 | —.03629 
ay .10581 | .05361 | .03375 | —.06365 | —.06022 | —.09421 
as .13358 | .03635 | —.05523 | —.09823 | —.07138 
a .10523 | —.05576 | —.04461 | —.03433 
I 12204 05734 .06178 
he 14663 06867 
hs . 20025 
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Also needed in the computation of K, are the coefficients of o? in the 
expectations of squares and products of right members of the least 
Squares equations. Table 26 facilitates this computation. 


TABLE 26 


Right members 


Sires 


jt 


bo 


ao 


Yi... 
Y2... 
Y3... 
Y4... 
fe Be 
W.2.. 
3 


“J 


— 


Or OOO WO 


OoInrnoeo oo 


ps 
Soon#wt Oo © 


The coefficient of o in H(yj...) is 77 = 49, in E(yy...y2..)- is 7(13) 


= 91, etc. The complete set is shown in Table 27. 


TABLE 27 
aq a2 a3 a4 hy he hs 
ay 49 91 0 0 35 28 21 
a2 233 12 0 113 92 87 
a3 130 91 89 87 54 
a4 169 65 78 0 
hy 86 80 51 
he a7 42 
hs 45 


Now K, = 49 (.17524) + 2(91) (.03588) + --- + 45 (.20025) 

= 42.29 ap 
K; is obtained from Table 24 and 25, thus 
K, = 19 (.17524) + 2 (24) (.03588) + +++ + 45 (.20025) 


= 31.71. 


In a similar manner K, is found to be 22.57 and K; to be 22.57 (the 
equality of K, and K; is only a coincidence.) 
Table 28 presents the pertinent reductions and their expectations 


(excluding » and a, terms) 
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TABLE 28 
oh o; Ths oe 

DLLILD rise Si 57 57 On 11,124,007 
R(years, herd X sire F 

subclasses) 57 57 57 13 10,973 518 
R(years, herds, sires) lire 57 38.29 9 10,921 107 
R(years, herds) 57 42.29 as 7al ff 10,919 ,698 
R(years, sires) 22.57 57 22.08 6 10,800 ,679 


Then the equations to be solved are those of Table 29. 


TABLE 29 
o a; Ths oe 
Among Herds 34,43 0 15.72 3 120 ,428 
Among Sires 0 14.71 6.58 2 1,409 
Herds X Sires 0 0 18.71 4 52,411 
Residual 0 0 0 44 150 ,489 
The solution to these equations is o; = 2255, o. = —1295, 0, = 


2070, and o; = 3420. 


ESTIMATION OF COMPONENTS OF COVARIANCE 


The same general principles described in Methods 1, 2, and 3 for 
estimating variances can be employed to estimate covariances. To 
illustrate, suppose an observation is made on each of the progeny re- 
sulting from single crosses among inbred lines. If y;;, is the observation 
on the k-th progeny of the 2-th male line by the j-th female line cross, 
a model which might reasonably be assumed is 


Yim = BP Ge PR my Pe 857 + Cin 
where g; is the general combining ability of the 7-th line, g; is the general 
combining ability of the j-th line, m; is the maternal ability (exclusive 
of the genes transmitted to the progeny) of the j-th line, s,; is peculiar 
to crosses 7 X j and of j X 2, and e;;, is arandom error. Suppose further 


that the elements of the model fit Eisenhart’s Model II except that 
E(g.m;) = Ogm - 
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The problem is to estimate the variances and o,, . If Method 3 is 
used, unbiased estimates of the variances are obtained. If Method 1 is 
used, the estimates are biased due to the presence of ¢,, . If the least 
squares estimates of g; and m; are computed, an unbiased estimate of 
ym can be derived from >>, g; m; , the expectation of which is (p —1)o 9m 
— ras ki oe , where p is the number of lines and k; o? is the covariance 
between g; and Mm; , assuming that g; and m, are fixed. 

A more frequently occurring type of covariance estimation problem 
in animal breeding arises in connection with estimation of genetic and 
phenotypic correlations. Observations are taken on two or more traits 
in some population. The following linear models characterize such 
observations on two traits. 


oD bia + Ca 


i=1 


(20) Yo 


Dp 
y, = >> dbia:, + e 
t=1 


b; and bi are fixed fori = 1, -:- , q 
b; and b{ arerandom variables fort = q+ 1,°+::,p 
So far as the random variables are concerned, 
Eb; = Eb = 0 E(ei) = a 


E(b;)’ E(b;b4) Oi ge 


Eby = o; EX@,6)) -="eie 
EG.) -= 0, All other covariances = 0 


Now in place of least squares reductions like 
uate | oral Cas 


we substitute 
ys. boy: or by ViVi /Ci ) 


where Y,, , refers to a right member of the least squares equations for 


the second trait. 

Then the expectations of these reductions are exactly the same as 
described for estimation of variance components except that o;; , 1s 
substituted for o? and o,,, , is substituted for o¢. Therefore, any of the 
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three methods for estimating variances can be used equally well for 
estimating covariances when (20) is the model. 
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QUERIES 


GrorcE W. SNEDECcCOR, Editor 


QUERY: We have been using the technique given in Snedecor’s 
100 “Statistical Methods”, section 11:10 and have come across some 

results which have raised a question of correct procedure. The 
data are the weights (grams) of pete peepbrane in swine of the 25th. 
day of Eisen: LS 
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Since interaction may be assumed negligible, we proceeded as follows: 


Litter Size Sum of Squares, unadjusted = 11.06 
Litter Size Sum of Squares, adjusted = 24.28 
Correction for Disproportion = —13.22 


Is it possible to have a negative correction? If so, is it correct to 
use algebraic subtraction in this next step? 


Mating Sum of Squares, unadjusted =" 24,82 


Correction for Disproportion = —13.22 


Difference for Adjusted Sum of Squares = 38.04 


For the 2 X 2 table it is possible to have a negative 
ANSWER: correction to be subtracted algebraically as you have done. 

That is, the adjusted sum of squares may be larger than 
the unadjusted. 

For the R X 2 table, the method of adjustment given in table 11.22 
of my text is incorrect. My attention was called to this recently by 
Dr. J. O. Irwin. This adjusting device applies to the 2 X 2 table but 
not to the larger one. 

A method for calculating the adjusted sum of squares for the rows 
is illustrated below. It is similar to the scheme for calculating the 
interaction in table 11.22. The distinction is that the sums, S = %, + #2, 
are used instead of the differences. 


W = _ take S=4,+ 4% WSs 
ky + ke 
Hoist ys 22220 29 .6326 
0.8000 13.675 10.9400 
1.2000 17.800 21.3600 
1.5000 16.633 24.9495 
= W = 4.8333 = WS = 86.8821 
>> WS’ = 1,603.38 
(>> WS)?/>> W = 1,561.77 


‘Sum of Squares formating = 41.61 
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The final analysis of variance for your data is in this form (I carried 
more decimals than you did). 


Source of Degrees of Sum of Mean 
Variation Freedom Squares Square 
Litter Size 1 24.32 24:32 
Mating 3 41.61 13.87 
Interaction 3 12.69 4,23 
Error 14 108.23 7.73 


QUERY: In three experiments I have observed that treatment 
101 A produces more than B, but in no case was the difference 

significant. The experiments differed in design so that I cannot 
combine the original observations. Yet I think there should be some 
way to take advantage of the fact that in every experiment the proba- 
bility was less than 0.2. It doesn’t seem that this would be likely to 
happen if there were no real difference in yield. 


R. A. Fisher devised a method for combining comparable 
ANSWER: probabilities such as you seem to have. See his “‘Statistical 

Methods for Research Workers’, section 21.1. Bancroft 
and Anderson in their “Statistical Theory in Research”’, section 12.6, 
have emphasized the allowable variation in design. 

The method depends on the facts that (i) the sum of several values 
of chi-square is itself distributed as chi-square with the appropriate 
number of degrees of freedom and that (ii) minus twice the natural 
logarithm of a probability is distributed as chi-square with 2 degrees of 
freedom. 

As an example I shall parallel Fisher’s illustration, usmg common 
logarithms and changing the probabilities to conform to your specifica- 
tion. 


BP —log P Degrees of Freedom 
0.145 0.8386 2 
0.200 0.6990 2 
0.087 1.0605 2 
Total 2.5981 6 


x2 = 2(2.3026)(2.5981) = 11.965 
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The factor 2.3026 changes the common logarithm to the natural. 

For 6 degrees of freedom, x” = 11.965 corresponds to the probability 
of about 0.062. Thus, the evidence against the null hypothesis is 
greater than that in any of your individual experiments. 


QUERY: [I set up a 2° X 4 factorial experiment to study the 
102 effect of light on the feathering of chicks. I planned to use 

the second and third order interactions to estimate error, but 
the four birds in one lot died from causes not associated with the treat- 
ment. Is there any way to analyze the variance of the remaining 
treatments? The data enclosed are percentages of undesirable feathers 
at 10 and 12 weeks of age. 


R. L. Anderson suggested a way to analyze the variance 
ANSWER: of an unreplicated experiment with a missing treatment. 
“Tf high-order interactions are used as estimates of error, 
missing values should be determined by the process of minimizing the 
error variance.” Biometrics Bulletin, Volume 2, page 43, 1946. 
For illustration, it will be sufficient to use only part of your data. 
I have chosen the two levels of starting light treatment, s, (10 and 15 
hours per day during the first six weeks), the two dates of killing, k, 
(10 and 12 weeks), and three of the levels of finishing light treatment, 
f, (12, 18 and 24 hours from six weeks to killing). Again, for illustration 
only, I shall assign to the estimate of error not only the 2 degrees of 
freedom for the three-factor interaction, SKF, but also the 2 for the 
interactions of deviations from linear trend with the 2-level treatments, 
SF and KF. 
The following work sheet contains the data for females, among 
which the missing datum occurred. 
The sum of the four indicated squares is 381.85 — 23.627 + 0.52”. 
This is minimized by x = 23.62. For those who are not familiar with 
the rule for differentiation, use the formula, 


_ —(coefficient of 2) 
2(coefficient of x”) 


The substitution of 2 = 23.62 in the sum of squares for error yields 
Sum of Squares for Error = 102.90 
Mean Square (4 — 1=3df.) = 34.30 


The degrees of freedom for error are, as usual, decreased by one to 
compensate for the missing datum. 
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The value x = 23.62 is now substituted in the difference for each 
effect. not assigned to error and the mean square calculated in the 
regular way. 

Professor Anderson writes as follows: ‘‘As you indicate, the treat- 
ment comparisons will be slightly biased if the usual analysis of variance 
is used with the missing value. One might suggest using the covariance 
technique I used in my Biometrics Bulletin article on ‘Missing Plot 
Techniques.’ But since each comparison has a single degree of freedom, 
it might be easier to compute the variance for each comparison (hence 
adjust the variance instead of the numerator). However even this 
would be so complicated that I doubt if anyone would bother to adjust 
for the bias.” 


ABSTRACTS 


ENAR Joint Meeting of The Biometric Society and The Institute of 
Mathematical Statistics, April 29-M ay 1, 1958. 


J. EDWARD JACKSON AND ROBERT H. MORRIS. 

212 (Eastman Kodak Company, Rochester, New York). The Ap- 
plication of Multivariate Quality Control to a Photographic 
Problem. 


The use of univariate Quality Control procedures in photographic 
problems fails in many cases because several highly correlated measure- 
ments must be studied simultaneously. The use of Hotelling’s T” in the 
form T”? = (x — x)S-'(x — a)’ leads to practical difficulties because the 
determinant of S is usually quite small. However, the use of Principal 
Components in conjunction with the 7” technique offers a workable 
control tool which, in addition, supplies better photographic measure- 
ments than those previously used. A practical example will be given 
illustrating the use of this technique on an actual process control prob- 
lem. 


H. C. BATSON. (University of Illinois College of Medicine, 
213 Chicago). Factorial Chi-square Analysis of Data from Experi- 
ments in Immunology. 


The previously unpublished method developed by A. E. Brandt for 
Chi-square analysis of attribute data from factorial experiments is pre- 
sented in sufficient detail to permit employment of the technique by 
others. The method, designated “Factorial Chi-square” by Brandt, is 
essentially a form of analysis of variance employing normalized estimates 
of variance. The error term used in variance ratios is the normalized 
population variance calculated from the totals of the outcome groups in 
each experiment. Factorial coefficients are used directly in calculating 
Chi-square values based on individual degrees of freedom. Applications 
of the method are illustrated by means of examples taken from experi- 
mental immunology. Included among the examples are a 2 X 2 factorial, 
a2 X 2 X 3 factorial, a 2 X 2 factorial with 4-fold replication and a 
single classification experiment in which the number of individuals per 


experimental unit are unequal. 
259 


260 BIOMETRICS, JUNE 1953 


14 IRWIN D. J. BROSS. (Cornell University Medical College). 
Applications of Non-Parametric Methods to Medical Data. 


Four rank order significance tests are briefly described with the aid 
of a Drion diagram (Ann. Math. Stat., 1952, pp. 653-74). Pros and 
cons of the methods as applied to medical data are considered. Three 
illustrations are given of situations where the methods seem especially 
appropriate. In particular it is shown that Drion’s small sample version 
of the Smirnov test can be used to construct self-stopping follow-up 
schemes that would appear to be of value in clinical trials. 


15 DAVID G. KENDALL. (Magdalen College, Oxford and Prince- 
ton University). Stochastic Growth and Mutation Processes. 


The paper presents a simple technique (worked out in collaboration 
with W. A. O’N. Waugh) for dealing with the Bellman-Harris integral 
equation for stochastic growth, when the division-time distribution, 
dG(r), is a weighted convolution of x3-distributions. The method is then 
applied to the formulation and study of a stochastic model of bacterial 
mutation incorporating “phenotypic delay”’. 


GEORGE E. P. BOX AND W. A. HAY. (Institute of Statistics, 
Raleigh, N. C. and Imperial Chemical Industries, Manchester, 

216 + kEngland). Statistical Designs for the Efficient Removal of 
Trends Occurring in Comparative Experiments with Applications 
in Biological Assay. 


Consider the regression of a quantitative result y, which is measur- 
able but subject to error, on a quantitative factor 2, which can be fixed 
by the experimenter at any desired level. For example in biological 
assay y would be biological response and x the dose of drug. <A design 
is derived by means of which two such regressions (corresponding for 
example to a test preparation and a standard preparation of a drug) may 
be compared whilst eliminating, without loss of efficiency, a smooth time 
trend in the response y, which occurs due to factors outside the control 
of the experimenter. The experiments are performed in k pairs. In the 
u-th pair a dose d,, of the test preparation and a dose ad, of the standard 
preparation is administered, where u = 1, 2, --- , k and a is a suitably 
chosen constant so that roughly the same response is obtained with test 
and standard preparations. The doses d, are such that d, , d?, -+- , d? 
are orthogonal with the elements ¢, ¢’, --- , ¢” of a polynomial in time 
fitted to the pair means, and q and p are so chosen that the polynomials 
adequately represent the dose and time effects. In the important special 
case where a linear time trend is adequate the device of angular random- 
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isation (Box, G. E. P. “Multi-factor Designs of First Order’, Biometrika, 
Vol. 39 (1952), pp. 49-57) may be used to select the design. This ensures 
that subsequent normal theory statistical tests applied are exact what- 
ever the distribution of the response and whether the mathematical 
model can be strictly justified or not. 

A method is indicated whereby residual time effects within pairs may 
be eliminated using the between pairs trend estimate. 


W. S. CONNOR AND W. H. CLATWORTHY. (National 
217 Bureau of Standards). Necessary Conditions for the Existence 


of Partially Balanced Incomplete Block Designs with Two 
Associate Classes. 


For a partially balanced incomplete block design with two associate 
classes and with parameters v, b, r, k,n: , M2, 1, A2, and pj, (2,7,4 = 1,2), 
the following theorem has been proved: If (i) v > b, then it is necessary 
that (a) A be a perfect square and (b) either r — r, = 0, orr — r, = 0; 
(ii) v = b and v is even, then it is necessary that (a) A be a perfect square 
and (b) r — r, be a perfect square when ais odd (u = 1, 2); (iii) v = 6, 
v is of the form 4¢ + 3 (¢ = 0, 1, 2, ---), and A is not a perfect square, 
then it is necessary that (r — r,)(r — r.) be a perfect square, and (iv) 
v < band vis even, then it is necessary that A be a perfect square where 
rm = HA. — (-y + (-)*VWA) + OA +r), & = 1, 2), 
Y= Pi — P2,4=7 + 284+ 1,8 = pi2 + pi2, and a, and a, are 
nonnegative integers such that a, + a, = v — 1. Examples are given of 
sets of parameters which fail to satisfy these conditions. 


A. C. COHEN, JR. (The University of Georgia, Athens). Esti- 
218 mation in Truncated Bivariate Normal Distributions (Preliminary 
Report). 

Maximum likelihood estimators of the parameters of a bivariate 
normal population are developed for samples which are subjected to a 
truncation on one of the variates at known terminals. Both single and 
double truncations with the number of missing (unmeasured) observa- 
tions either known or unknown are considered. Asymptotic variances 
of the estimates are obtained from the likelihood information matrices. 


219 R. F. DRENICK AND P. NESBEDA. (RCA Victor Division, 
Camden, N. J.). Ona Class of Optimum Linear Predictors. 


Prediction is the problem of projecting into the future a set of ob- 
served data in order to obtain estimate for future observable data. For 
optimum prediction one assigns, through some considerations which are 
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not part of the method, a loss function representing the penalty for 
error. An optimum prediction procedure is the one which minimizes, 
in the long run, this penalty. N. Wiener pointed out that the optimum 
mean square predictor is linear if the interference affecting the observa- 
tions has gaussian probability distribution. By using a method of 
estimation due to Pitman (Estimation of location and scale parameters. 
Biometrika 30 (1939)) the authors show that the class of linear predictors 
is characterized by the gaussian probability distribution and by a loss 
function more general than rms, namely, one which is symmetric and 
has continuous derivatives. Most of the loss functions of practical 
interest are in this category. Furthermore any such loss function leads 
to the same linear predictor X, which has also the property: P(X, — 
az < k) = max for all k > 0, x being the true value. (Work sponsored 
by the Bureau of Aeronautics.) 


D. B. DUNCAN. (Virginia Polytechnic Institute, Blacksburg). 
220 Multiple Range Tests and the Multiple Comparisons Test 
(Preliminary Report). 


Several methods are available for testing differences between treat- 
ments in an analysis of variance. The two considered most satisfactory 
are one by Newman (1939) and Keuls (1952) and the Multiple Com- 
parisons Test by Duncan (1951). Both employ repeated homogeneity 
tests. The Newman-Keuls test is simpler because it uses repeated range 
tests instead of F tests as used by the Multiple Comparisons Test. The 
latter is generally more sensitive owing partly to this reason but mostly 
to the relaxation of the significance levels of some of the tests considered 
to be of diminished importance. This paper presents: a new Multiple 
_ Range Test which achieves the simplicity of the Newman-Keuls test by 
using range tests and most of the sensitivity of the Multiple Comparisons 
Test by using the special significance levels, and an improved set of 
application rules for the Multiple Comparisons Test. Each of these is 
recommended for use depending on the relative needs for simplicity or 
sensitivity. The special system of significance levels is discussed in some 
detail. The author is indebted to W. Beyer in the determination of sig- 
nificant ranges for the new test which is still in progress. 


NORMAN LLOYD JOHNSON. (Institute of Statistics, Chapel 
221 =~ Hill, N. C.). Sequential Procedures in Component of Variance 
Problems. 


Three types of sequential procedure are discussed. (1) Procedures 
based on the use of sample variance at each stage, (2) procedures based 


. 
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on successive independent groups of samples, and (3) procedures based 
on the isolation of independent degrees of freedom at each stage. 

In cases (2) and (3) approximate formulae for the average sampling 
number (ASN) are evaluated. It is found that (3) gives a lower ASN 
than (2) when the successive groups in (2) are of small size but that (2) 
gives the lower ASN (formally) for larger sized groups. It is conjec- 
tured that the latter (ASN for large group size in (2)) may be related to 
the ASN under procedure (1). 


MARVIN ZELEN. (National Bureau of Standards). The 
222 Analysis of Some Incomplete Block Designs with a Missing 
Block. 


During the course of carrying out an experiment the situation may 
arise where all the observations from a particular block are missing. 
This paper outlines the statistical analysis if a block is missing for (a) 
balanced incomplete block designs, (b) partially balanced incomplete 
block designs of the group divisible type, (c) simply linked block designs, 
and (d) several miscellaneous cases which depend on the relationship 
of the treatments in the missing block to each other. 


C. I. BLISS, THEODORE GREINER AND HARRY GOLD. 
223 (Yale University and Cornell University Medical College). 
Estimating the Dose of a Cardiac Glycoside for Human Subjects. 


The threshold dose of digitoxin has been estimated for each of 89 
human subjects, ranging in age from 3 to 74 years. When dosage was 
expressed in micrograms per pound of body weight, children required a 
59% larger dose than adults. Four partial regression equations have 
been computed from the same data, for children and adults separating 
males and females, in each equation relating the logarithm of the thres- 
hold dose per individual to his log-weight, log-height and age. In every 
case, the regression on age was less than its error. Moreover, the regres- 
sion equations of log-dose upon log-weight and log-height did not differ 
significantly between children and adults in either sex. Combining age 
groups within each sex, log-height contributed no information and the 
regressions on log-weight for males and for females could be fitted by 
parallel lines with a slope of .55, with males requiring 27% more digitoxin 
than females. Adjusting dosage approximately to the square root of the 
body weight should correct adequately for variations in the age and size 


of patients. 
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CLYDE Y. KRAMER AND DAVID B. DUNCAN. (Virginia 
224 Polytechnic Institute, Blacksburg). On the Analysis of Variance 
of a Two-Way Classification with Unequal Subclass Numbers. 


This paper reviews important current methods on the analysis of 
variance of a two-way classification with unequal subclass numbers, 
presenting them from a unified point of view and also proposes a new 
method which has special advantages for particular situations. 

For cases in which no interaction is present, the most efficient pro- 
cedure is the method of fitting constants. The method of weighted 
squares of means is much simpler to apply but would generally sacrifice 
too much efficiency. The virtue of the new method is that it is equally 
simple for such cases and is more efficient than the method of weighted 
squares of means. 

There are situations in which this would not be true and in which 
case the method of weighted squares of means should be used. A quick 
test for recognizing situations of this type is provided. 

The new method tends to give weight to subclass means more in 
proportion to the numbers on which they are based than does the method 
of weighted squares of means. When the number of subclasses are large 
the time and-effort saved by the proposed method may outweigh the loss 
of efficiency. 


ALBERT B. PARKS. (Bureau of Human Nutrition and Home 
225 Economics, U.S.D.A.). Ranking Versus Scoring in Palatability 
Tests Using Small, Trained Panels. 


A design for the simultaneous scoring and ranking of samples in 
palatability tests is presented and illustrated by an experiment with a 
small, trained panel used in the detection of off-flavors. Parallel sets of 
results, in the form of mean scores as compared with estimated treat- 
ment ratings based on ranks, are discussed in terms of validity, consis- 
tency of tests of significance, relative discrimination among treatments, 
and performance of panel members. It is suggested that this design 
may be useful in testing a ranking technique when a series of experiments 
is in progress and a scoring method has been employed. Continuity of 
past results is maintained. 


MAX A. BERSHAD, WILLIAM N. HURWITZ AND RALPH 
226 S. WOODRUFF. (Bureau of the Census). Sampling for Time 
Series. 


This presentation examines a rotation pattern for a sample which is a 
composite of the optimum rotation for month-to-month change and the 
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optimum for totals (or averages) over a time period. In connection with 
this rotation pattern, an estimate, which is a composite of a chained 
estimate and a simple unbiased estimate, is examined. The variances 
of some common statistics are presented for the composite estimate and 
compared with the variances for the chained and unbiased estimates. 
The composite estimate is proved to be the best linear estimate of level 
for any one time period when all past observations, first assumed as 
numerous, are taken into account. The form of the estimate when many 
observations are not available is given and is indicated as approximated 
well by the formula for many observations. 


227 WALTER A. HENDRICKS. (Bureau of Agricultural Econom- 
ics). Response Rates and Selectivity in Mail Surveys. 


Cumulative means per sampling unit from successive mailed returns 
often follow a simple equation of the form, Y = aX’. This provides a 
simple basis for extrapolating estimates to a 100 percent-return equiva- 
lent. Response rates seem to follow a law identical to relationships 
found in dosage-mortality studies in biology. These relationships seem 
to hold when surveys are conducted on a particular subject in a restricted 
universe. When general-purpose surveys are conducted, the relationship 
between percentage returns and means per sampling unit for individual 
items becomes more complex. In general, there seem to be factors which 
induce a respondent to fill out a mailed questionnaire and other factors 
which work simultaneously in the opposite direction. The net effect 
of these opposing influences governs the response rate to a mailed 
survey. The particular way in which the items to be estimated from 
the survey are correlated with the factors affecting response determines 
the form of the relationship between cumulative means and cumulative 
response rates from repeated mailings. In general-purpose surveys there 
does not seem to be any easily predictable pattern because of the multi- 
plicity of interrelationships. 


French Region, February 25, 1953. 


J. SUTTER AND L. TABAH. Resultats au Test Mosaique 
228 de Gille dans 1.244 Fratries de Deux Enfants et 380 Couples de 
Jumeaux. 


Dans les fratries de deux enfants, aprés réduction du facteur Age, les 
performances des gargons précédés ou suivis d’un gargon, apparaissent 
significativement inférieures 4 celles des gargons précédés ou suivis 
d’une fille. Par contre, les filles ne sont pas influencées par la composition 
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de la fratrie. Il existe une corrélation positive trés significative entre 
les performances au test dans les fratries et la durée de Vintervalle 
intergénésique. Le rang de naissance n’exerce pas d’influence dans les 
diverses catégories de fratries, sauf dans les fratries de deux gargons ou 
deux filles, urbaines et dont l’intervalle intergénésique est faible; les 
premiers nés apparaissent alors inférieurs aux deuxiémes nés. Le coeffi- 
cient de corrélation intra-classe apparait de l’ordre de 0,5 dans toutes les 
catégories de fratries. 

Parmi les couples de jumeaux, dont il n’a pas été possible de distin- 
guer les monozygotes des dizygotes, les performances apparaissent in- 
férieures 4 celles des non jumeaux. Les couples de garcons sont infé- 
rieurs aux couples de filles. Les coefficients de corrélation sont de 0,842 
chez les garcons, 0,823 chez les filles et 0,781 dans les couples hétéro- 
genes en sexe. 


British Region, February 26, 1958. 


229 KATHERINE H. COWARD. The Use of the Logistic Curve 
in Bio-Assay. 


The logistic curve best fitting the data may be determined by 
strating from trial estimates of the ceiling and base values, and then 
modifying these values until the average of the ratios of the doses corre- 
sponding to successive Y-values most nearly approaches the known 
ratio of the doses given. Test and Standard can be worked out side by 
side. 

The potency of the Test Substance in terms of Standard is given by 
(a) the number of units of Standard and (b) the weight of Test Sub- 
stance corresponding to one standard interval of the curve. (Ref: 
Emmens, C. W.: J. Endocrinology, 1940, 2, 194). 


230 J.O. IRWIN. On the ‘Transition Probabilities” Corresponding 
to Any Accident Distribution. 


From any known distribution of accidents in a fixed exposure time 7, 
the expected number of other accidents sustained by a person who has 
x accidents is obtained. It is the ratio of the x + 1-th to the z-th 
factorial moment of the distribution. Its limiting value when the expo- 
sure time tends to zero gives the transition probabilities. (To appear in 
J. Roy. Statist. Soc. B.) 
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J. A. FRASER ROBERTS. The Use of Regressions Involving 
23] Variances of Independent Variates for Calculating Age-Corrected 
Scores. 


It is often useful to calculate once and for all an age-corrected score 
for each individual. If there is a substantial change of variance with age 
as well as of mean, the scores can be adjusted by using the regression of 
the variance of the measurement on age (weighting the variance in each 
array by the number in the array). The fit of the regression may be 
tested by a method due to R. A. Fisher. 

The mean square, derived by dividing the weighted sum of squares 
of deviations from the regression line by k — r, where k is the number 
of arrays and r — 1 the order of the polynomial is: 


Dd (Yp — Y,)°/(k — 1) (1) 


The theoretical sampling variance of an estimate y of a true variance Y, 
based on a sample of n, is 2Y"/n. (1) therefore estimates approximately 
the mean value of 2Y*, taken over the k arrays, namely 2 >, Y3/k. The 
approximation, which should lead to little discrepancy, is due to the 
fact that the observed variances are weighted only according to the 
number of observations and not according to estimates of their true 
values. Thus the mean square due to linear regression is compared with 
twice the square of the mean variance, with degrees of freedom 1 and ~; 
the extra mean square due to the quadratic term with 2/k times the 
sum of squares of the expected variances given by the linear function, 
etc.; the remainder with 2/k times the sum of squares of the expected 
variances given by the highest polynomial fitted, with degrees of free- 
dom (k — r) and ~. 

Application of the method to the Terman-Merrill revision of the 
Binet Scale removes the great bulk of the very large and troublesome 
residual association between age and I.Q. 

Mean blood-pressure grows steadily throughout life and variance 
increases more than fivefold. In a genetic investigation it is essential 
to be able to compare and add in various ways persons of very different 
ages, for example, the parents, the sibs and the children of a sample of 
subjects. Adjustment for variance as well as for mean has provided 
individual scores which are almost entirely independent of age. (See 
J. A. F. Roberts and M. A. Mellone. British Journal of Psychology, 
Statistical Section, 5, 65, 1952; G. W. Pickering, G. S. C. Sowry, M. 
Hamilton and J. A. F. Roberts. Clinical Science, in preparation.) 


THE THIRD INTERNATIONAL BIOMETRIC CONFERENCE 


Preliminary Programme 


Lake of Como, Bellagio, Italy—September 1-5, 1953 


1st September 


9:00 A.M. 


10:00 A.M. 


12:30 P.M. 
3:00 P.M. 


Inauguration: Welcoming address and Presidential ad- 
dress. 


The first course in biometry—a symposium (Chairman: 
W. G. Cochran) 

L. Martin. Enseignement des principes d’expérimentation 
des méthodes statistiques 4 des biologistes dans deux 
éstablissements belges d’enseignement supérieur. 

G. Barbensi. L’insegnamento della Biometria. 

C. I. Bliss. A course in Biometry for graduate students in 
Biology. 

By Title: A. Vessereau. Enseignement des methodes 
statistiques appliquées 4 la Biometrie. 


First general business meeting. 


Mathematical problems in Genetics (Chairman: A. Buzzati- 
Traverso) 

Sir Ronald Fisher. The variability in the length of germ 
plasm still heterogeneous after a given amount of 
inbreeding. 

K. Mather. The methodology of Biometrical Genetics. 

A. R. G. Owen. Experimental designs in Genetics. 

D. Lowry. Variance components with reference to genetic 
population parameters. 

C. A. B. Smith. The calculation of correlation between 
cousins. 


2nd September. 


9:00 A.M. 


Methodological Problems in Biometry (Chairman: Gertrude 
M. Cox) 

J. W. Hopkins. Some needed significance tests. 

F. Anscombe. Fixed-sample-size analysis of sequential 
observations. 
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3:00 P.M. 


W.G. Cochran. The combination of estimates from differ- 
ent experiments. 
N. Blomqvist. Rank analysis of incomplete block designs. 
M. Keuls. Testing differences between means in an analy- 
sis of variance. 
M. J. R. Healy. Decision between two alternatives: how 
many experiments? 
By Title: L. Martin. Suggestions for longitudinal data 
in gerontology. 
J. Hemelrijk and J. H. B. Kemperman. Use of 
a sequential Wilcoxon-test for diagnostic 
purposes. 


Biometry in Immunology (Chairman: H. C. Batson) 
R. Prigge. Die Anwendung der Mutungsbereiche in der 
-Immunitaetsforschung. 

I. Ipsen. Factors of dosage and host determining antibody 
response to secondary antigen stimulus. 

L. B. Holt. Quantitative studies in diphtheria prophy- 
laxis—an attempt to derive a mathematical charac- 
terisation of the antigenicity of diphtheria prophy- 
lactics. 

S. Peto. A dose response equation for the invasion of 
microorganisms. 


3rd September. 
9:00 A.M. Biometric methods in Agriculture (Chairman: P. V. Suk- 


2:30.PR:M. 


hatme) 

F. Yates. The place of simple experiments on cultivator’s 
fields in agricultural development. 

V. G. Panse. Principles of the survey method of experi- 
mentation. st 

T. N. Hoblyn and S. C. Pearce. Biometrical peel in 
research on long-lived plants. 

H. Strecker and J. Raab. Methodological peep lentee ina 
survey of milk production of hog breeders in Germany. 

G. Rasch. On different sources of errors and the advantage 
of their knowledge in planning experiments. 

J. L. Lush. Estimating heritabilities. 

M. Matemura. Designs for agricultural research in Japan. 


Excursion. 
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4th September. 


9:00 A.M. Functional relations in experimentation (Chairman: H. 
Wold) 
D.J. Finney. Functional relationships in experimentation. 
J. Berkson. Minimum chi square and maximum likelihood 
estimates of regression coefficients. 
J. Neyman. Frisch’s problem on linear structural rela- 
tions. 


12:00 A.M. Biometrics business meeting. 


3:00 P.M. Contributed papers (Chairman: M. J. R. Healy) 
A. F. Parker-Rhodes. Estimating populations of irregu- 
larly observable organisms. 
D. W. Goodall. Factor analysis in Plant Sociology. 
E. F. Scott. Bivariate contagious distributions. 
G. Karreman. The mathematical biology threshold and 
related phenomena in excitation. 
By Title: M. W. Bentzon. On the statistical evaluation 
of dose response curves in case the dose 
intervals are ee 


— 


8:00 P.M. Social dinner. 


sth September. ~ 


9:00 A. M. Industrial Ge ae : igi (Chairman: 
ma TY: sb, mi Sait A ne | 


THE BIOMETRIC SOCIETY 


Indian Members. The Indian group sponsored jointly with the 
Indian Society of Agricultural Statistics a symposium in New Delhi 
on the 26th of February on “New Problems in Experimentation.” 
Frank Yates presided and the speakers included D. J. Finney, K. R. 
Nair, V. G. Panse and others. 

British Region. The British Region held a joint symposium at the 
Wellcome Research Institution in London on March 11th on “Flavour 
Assessment” in collaboration with the Society of Chemical Industry 
(Food Group) and the Society of Public Analysts (Biological Methods 
Group). The program consisted of papers by E. D. Adrian on The 
physiological background of flavour assessment, by H. G. Harvey on 
Basic considerations in regard to flavour assessment, by A. 8S. C. Ehren- 
burg and J. M. Shewan on The objective approach to sensory tests, by 
J. M. Harries on Sensory tests and consumer acceptance, and by 
J. O. Irwin on A biometrician’s viewpoint. A more complete account 
of the meeting has been published in Nature for April 25, 1953. 

Italian Region. The Third Italian Meeting was held at the Uni- 
versity of Florence on April 2nd. In the morning session on ‘‘Genetic 
Selection” papers were presented by I. M. Lerner, currently a guest at 
the University of Pavia, on The problem of quantification of selection 
methods, and by R. Scossiroli, also of Pavia, on Statistical analysis of 
selection experiments in Drosophila populations treated with X-rays. 
The afternoon program on ‘General Methodology” offered lectures by 
F. Brambilla, of the University of Genoa and the University Bacconi, 
Milan, on Confluence analysis, and by G. Barbensi on Criticisms on the 
use of chi-square in the fourfold table. At the annual business meeting, 
which followed the morning scientific program, L. L. Cavalli reported on 
the activities of the year. Statutes for the Region were discussed and ~ 
approved. It was agreed to send a letter over the signatures of several 
university professors in medical and biological faculties recommending 
that these faculties provide teaching in statistics. This letter is to be 
given as wide circulation as possible. The Region will sponsor a 
duplicated bulletin of methodological papers of general interest, with 
emphasis on the popularization of statistical methods. It will be sent 
without charge to each member. Through the initiative of Professor 
Brambilla, a Methodological Section has been formed which holds 
weekly seminars at the Department of Statistics of the University 


Bacconi of Milan. 
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ENAR. On April 29th to May Ist the Eastern North American 
Region met jointly with the Institute of Mathematical Statistics in 
Washington, D. C. The opening session on April 29th considered 
“Statistics in the Physical Sciences” with papers by J. E. Jackson and 
R. H. Morris on The application of multivariate quality control to a 
photographic problem, and by J. M. Cameron on Control and measure- 
ment of experimental error. On Thursday ‘Statistics in the Biological 
Sciences” provided the morning program, with papers by H. C. Batson 
on Factorial chi-square analysis of data from experiments in immunol- 
ogy, by I. D. J. Bross on Applications of nonparametric methods to 
medical data, by D. G. Kendall on Stochastic growth and mutation 
processes, and by G. E. P. Box and W. A. Hay on Statistical designs for 
the efficient removal of trends occurring in comparative experiments with 
applications in biological assay. The afternoon program offered papers 
by W. H. Clatworthy and W. 8S. Connor on Necessary conditions for the 
existence of partially balanced incomplete block designs with two associ- 
ate classes, by A. C. Cohen, Jr. on Estimation in truncated bivariate 
normal distributions, by R. F. Drenick and P. Nesbeda on A class of 
optimum linear predictors, by D. B. Duncan on Multiple range tests 
and the multiple comparisons test, by N. L. Johnson on Sequential 
procedures in component of variance problems, and by M. C. K. 
Tweedie on Sequential estimation. On Friday papers were given by 
M. Zelen on The analysis of some incomplete block designs with a missing 
block, by A. W. Kimball on The fitting of multi-hit survival curves, by 
C. I. Bliss, T. Greiner and H. Gold on Estimating the dose of a cardiac 
glycoside for human subjects, by C. Y. Kramer On the analysis of vari- 
ance of a two-way classification with unequal sub-class numbers, by 
A. B. Parks on Ranking versus scoring in palatability tests using small, 
trained panels, by M. A. Bershad, W. N. Hurwitz and R. 8. Woodruff 
on Sampling for time series, by W. A. Hendricks on Response rates and 
selectivity in mail surveys, by R. C. Bose and 8. N. Roy on Simultaneous 
confidence interval estimation and testing of hypotheses. 

Japanese Members. Largely through the efforts of Mr. M. Hata- 
mura of the National Institute of Agricultural Sciences in Tokyo, the 
Society has at this date 25 members in Japan. Mr. Hatamura has agreed 
to serve provisionally as National Secretary. He reports that a meeting 
is planned in the near future to develop a program for Japan. 


NOTES 


SUMMER STATISTICAL SEMINAR 


University of Connecticut, 1953 


The Summer Seminar in Statistics will meet for the fourth year at the 
University of Connecticut during the three weeks of August 10 through 
28, 1953. The general pattern of programs is to be the same as in previ- 
ous years. Informality and discussion will be stressed. There will be one 
or two seminar sessions each day and a clinic on the treatment of prob- 
lems in application. 

The first week, August 10 through 14 will be devoted to Statistical 
Methodology in Physics. It is being organized by Dr. E. W. Pike, 
Equipment Engineering Division, Raytheon Manufacturing Company, 
Newton 58, Mass., together with Dr. Churchill Eisenhart and Dr. E. P. 
King. 

The second week, August 17 through 21 will be devoted to Statistics 
in Biometry and Medicine. It is being organized by Professor G. Beall, 
Statistical Laboratory, University of Connecticut, Storrs, Conn., to- 
gether with Dr. I. Bross and Dr. D. Mainland. 

The third week, August 24 through 28 will be broken into two parts. 
The first will be devoted to the ASA Handbook, as organized by Professor 
F. Mosteller, Laboratory of Social Relations, Harvard University, Cam- 
bridge 38, Mass. The second part will be devoted to Performance and 
Reliability of Complex Mechanical Assemblies, as organized by Pro- 
fessor G. H. Shortley, The Johns Hopkins University, 6410 Connecticut 
Avenue, Chevy Chase, Maryland. ' 

Anyone interested in the subjects under discussion is invited to 
attend for the day, week, or other period. (A nominal registration fee 
will be collected). 

For further discussion please write the Secretary of the Seminar, 
Professor Geoffrey Beall, Statistical Laboratory, University of Connecti- 
cut, Storrs, Connecticut. Information will be provided on lodging and 
meals. A detailed outline of the program will be furnished. In addition 
to supplying information, the Secretary will welcome suggestions of 
topics for the Seminar or Clinic sessions. 
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INTERNATIONAL Conaress oF Matuematicrans 1954 


First Communication 


In the final plenary session of the International Congress of Mathe- 
maticians, 1950, held in Cambridge (Mass.) the Congress accepted the 
invitation of the delegation from the Netherlands to hold the next 
Congress in the Netherlands. 

The International Congress of Mathematicians 1954 will be held in 
Amsterdam from September 2nd to September 9th under the auspices of 
“Het Wiskundig Genootschap” (The Mathematical Society of the 
Netherlands). It is the sincere hope of the ““Wiskundig Genootschap”’ 
that the Congress 1954, which will be open to all mathematicians from 

~ all parts of the world, will be a fertile international gathering. _ 

The Organizing Committee has invited a number of outstanding 
mathematicians to deliver one-hour addresses, hoping that in this way a 
survey of the recent development in the whole field of mathematics may 
be furnished. 

There will be seven sections, viz: (1) Algebra and Theory of Num- > 
bers, (2) Analysis, (8) Geometry and Topology, (4) baal and 
Statistics, (5) Mathematical Physics and Applied Mathematics, (6) 
Logic and Foundations, and (7) Philosophy, History and E Adu 
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